-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
(Worker_TP1 pid=43610) INFO 10-23 09:29:27 [gpu_model_runner.py:2602] Starting to load model /media/kwaishop-langbridge-kfs/chenjunjie/live-product-matcher-reject/sft_train/ms-swift/qwen3_vl_2b_1023_sft_afternoon_prompt_v6_merge...
(Worker_TP1 pid=43610) INFO 10-23 09:29:27 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP1 pid=43610) INFO 10-23 09:29:28 [cuda.py:366] Using Flash Attention backend on V1 engine.
(Worker_TP0 pid=43609) INFO 10-23 09:29:28 [gpu_model_runner.py:2602] Starting to load model /media/kwaishop-langbridge-kfs/chenjunjie/live-product-matcher-reject/sft_train/ms-swift/qwen3_vl_2b_1023_sft_afternoon_prompt_v6_merge...
(Worker_TP0 pid=43609) INFO 10-23 09:29:28 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP0 pid=43609) INFO 10-23 09:29:29 [cuda.py:366] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
(Worker_TP1 pid=43610) INFO 10-23 09:29:30 [default_loader.py:267] Loading weights took 2.22 seconds
(Worker_TP1 pid=43610) INFO 10-23 09:29:31 [gpu_model_runner.py:2653] Model loading took 2.7054 GiB and 2.993759 seconds
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.08s/it]
(Worker_TP0 pid=43609)
(Worker_TP0 pid=43609) INFO 10-23 09:29:32 [default_loader.py:267] Loading weights took 2.27 seconds
(Worker_TP0 pid=43609) INFO 10-23 09:29:33 [gpu_model_runner.py:2653] Model loading took 2.7054 GiB and 3.420745 seconds
(Worker_TP0 pid=43609) INFO 10-23 09:29:33 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(Worker_TP1 pid=43610) INFO 10-23 09:29:33 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] WorkerProc hit an exception.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model_runner.profile_run()
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3361, in profile_run
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model.get_multimodal_embeddings(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1378, in get_multimodal_embeddings
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] vision_embeddings = self._process_image_input(multimodal_input)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1301, in _process_image_input
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return run_dp_sharded_mrope_vision_model(self.visual,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/vision.py", line 338, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] image_embeds_local = vision_model(pixel_values_local,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 517, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] hidden_states = blk(hidden_states,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 200, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] x = x + self.attn(self.norm1(x),
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 369, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = flash_attn_varlen_func(q,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 233, in flash_attn_varlen_func
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/_ops.py", line 1243, in call
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._op(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model_runner.profile_run()
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3361, in profile_run
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model.get_multimodal_embeddings(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1378, in get_multimodal_embeddings
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] vision_embeddings = self._process_image_input(multimodal_input)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1301, in _process_image_input
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return run_dp_sharded_mrope_vision_model(self.visual,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/vision.py", line 338, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] image_embeds_local = vision_model(pixel_values_local,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 517, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] hidden_states = blk(hidden_states,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 200, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] x = x + self.attn(self.norm1(x),
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 369, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = flash_attn_varlen_func(q,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 233, in flash_attn_varlen_func
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/_ops.py", line 1243, in call
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._op(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in init
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] result = result.result()
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] return self.__get_result()
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] raise self._exception
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] result = self.fn(*self.args, *self.kwargs)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] raise RuntimeError(
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] RuntimeError: Worker failed with error 'CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ', please check the stack trace above for the root cause
[rank1]:[W1023 09:29:42.972180911 TCPStore.cpp:138] [c10d] recvValueWithTimeout failed on SocketImpl(fd=90, addr=[localhost]:54338, remote=[localhost]:45167): Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:682 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x80 (0x7f89ca37eeb0 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: + 0x5d694d1 (0x7f89ae3ef4d1 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #2: + 0x5d6ab2d (0x7f89ae3f0b2d in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0x5d6b1e9 (0x7f89ae3f11e9 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::doWait(c10::ArrayRef<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::chrono::duration<long, std::ratio<1l, 1000l> >) + 0x1c6 (0x7f89ae3ec5f6 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::TCPStore::doGet(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x33 (0x7f89ae3edfe3 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #6: c10d::TCPStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x114 (0x7f89ae3ef0f4 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #7: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #8: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #9: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #10: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #11: c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId, bool, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int) + 0x5c4 (0x7f896d8d9bd4 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #12: c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::Device&, c10d::OpType, int, bool) + 0x1bba (0x7f896d8dc67a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #13: c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) + 0x141f (0x7f896d8ffb7f in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #14: + 0x5d080e8 (0x7f89ae38e0e8 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #15: + 0x5d0effe (0x7f89ae394ffe in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #16: + 0x5d2dd0b (0x7f89ae3b3d0b in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #17: + 0xc78589 (0x7f89bd7b6589 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #18: + 0x3767d2 (0x7f89bceb47d2 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #19: VLLM::Worker_TP1() [0x543944]
frame #20: _PyObject_MakeTpCall + 0x2fc (0x51778c in VLLM::Worker_TP1)
frame #21: _PyEval_EvalFrameDefault + 0x6d2 (0x521952 in VLLM::Worker_TP1)
frame #22: + 0x85294a (0x7f89bd39094a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #23: + 0xb584ab (0x7f89bd6964ab in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #24: + 0x5b4222d (0x7f89ae1c822d in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptrtorch::jit::Operator, std::allocator<std::shared_ptrtorch::jit::Operator > > const&, pybind11::args const&, pybind11::kwargs const&, std::optionalc10::DispatchKey) + 0x394 (0x7f89bd449a04 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #26: torch::jit::_get_operation_for_overload_or_packet(std::vector<std::shared_ptrtorch::jit::Operator, std::allocator<std::shared_ptrtorch::jit::Operator > > const&, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optionalc10::DispatchKey) + 0x1a9 (0x7f89bd449dc9 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #27: + 0x81567a (0x7f89bd35367a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #28: + 0x3767d2 (0x7f89bceb47d2 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #29: VLLM::Worker_TP1() [0x543944]
frame #30: _PyObject_Call + 0xb5 (0x555cd5 in VLLM::Worker_TP1)
frame #31: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #32: _PyObject_FastCallDictTstate + 0x285 (0x519fc5 in VLLM::Worker_TP1)
frame #33: _PyObject_Call_Prepend + 0x66 (0x552ff6 in VLLM::Worker_TP1)
frame #34: VLLM::Worker_TP1() [0x6282e6]
frame #35: _PyObject_MakeTpCall + 0x2fc (0x51778c in VLLM::Worker_TP1)
frame #36: _PyEval_EvalFrameDefault + 0x6d2 (0x521952 in VLLM::Worker_TP1)
frame #37: VLLM::Worker_TP1() [0x56d11d]
frame #38: VLLM::Worker_TP1() [0x56ccad]
frame #39: _PyObject_Call + 0x122 (0x555d42 in VLLM::Worker_TP1)
frame #40: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #41: VLLM::Worker_TP1() [0x56d11d]
frame #42: VLLM::Worker_TP1() [0x56cce0]
frame #43: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #44: PyEval_EvalCode + 0xae (0x5de5ce in VLLM::Worker_TP1)
frame #45: VLLM::Worker_TP1() [0x61b7b7]
frame #46: VLLM::Worker_TP1() [0x616307]
frame #47: PyRun_StringFlags + 0x5f (0x61232f in VLLM::Worker_TP1)
frame #48: PyRun_SimpleStringFlags + 0x3a (0x611eca in VLLM::Worker_TP1)
frame #49: Py_RunMain + 0x4e1 (0x60f801 in VLLM::Worker_TP1)
frame #50: Py_BytesMain + 0x39 (0x5c6bb9 in VLLM::Worker_TP1)
frame #51: + 0x29d90 (0x7f89cb159d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #52: __libc_start_main + 0x80 (0x7f89cb159e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #53: VLLM::Worker_TP1() [0x5c69e9]
[rank1]:[W1023 09:29:42.988847516 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=90, addr=[localhost]:54338, remote=[localhost]:45167): Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:682 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x80 (0x7f89ca37eeb0 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: + 0x5d694d1 (0x7f89ae3ef4d1 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #2: + 0x5d6a8cd (0x7f89ae3f08cd in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0x5d6b47a (0x7f89ae3f147a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&) + 0x31e (0x7f89ae3ec19e in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x398 (0x7f896d8d1b18 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xdbbf4 (0x7f895125abf4 in /root/miniconda3/envs/qwen3-vl/bin/../lib/libstdc++.so.6)
frame #7: + 0x94ac3 (0x7f89cb1c4ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: + 0x126850 (0x7f89cb256850 in /lib/x86_64-linux-gnu/libc.so.6)
[rank1]:[W1023 09:29:42.993390455 ProcessGroupNCCL.cpp:1783] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:42 [multiproc_executor.py:154] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=43418) Process EngineCore_DP0:
(EngineCore_DP0 pid=43418) Traceback (most recent call last):
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=43418) self.run()
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=43418) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=43418) raise e
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=43418) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=43418) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in init
(EngineCore_DP0 pid=43418) self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=43418) self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=43418) return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=43418) result = result.result()
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=43418) return self.__get_result()
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=43418) raise self._exception
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(EngineCore_DP0 pid=43418) result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=43418) raise RuntimeError(
(EngineCore_DP0 pid=43418) RuntimeError: Worker failed with error 'CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=43418) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=43418) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=43418) Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(EngineCore_DP0 pid=43418) ', please check the stack trace above for the root cause
(APIServer pid=43208) Traceback (most recent call last):
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/bin/vllm", line 7, in
(APIServer pid=43208) sys.exit(main())
(APIServer pid=43208) ^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=43208) args.dispatch_function(args)
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=43208) uvloop.run(run_server(args))
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
(APIServer pid=43208) return __asyncio.run(
(APIServer pid=43208) ^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=43208) return runner.run(main)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=43208) return self._loop.run_until_complete(task)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=43208) return await main
(APIServer pid=43208) ^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=43208) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=43208) async with build_async_engine_client(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=43208) return await anext(self.gen)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=43208) async with build_async_engine_client_from_engine_args(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=43208) return await anext(self.gen)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=43208) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/utils/init.py", line 1572, in inner
(APIServer pid=43208) return fn(*args, **kwargs)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=43208) return cls(
(APIServer pid=43208) ^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 134, in init
(APIServer pid=43208) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=43208) return AsyncMPClient(*client_args)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 769, in init
(APIServer pid=43208) super().init(
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 448, in init
(APIServer pid=43208) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=43208) next(self.gen)
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=43208) wait_for_engine_startup(
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=43208) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=43208) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
这个怎么解决
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.