-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
Collecting environment information...
==============================
System Info
==============================
OS : Ubuntu 24.04.2 LTS (x86_64)
GCC version : (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version : Could not collect
CMake version : version 3.31.6
Libc version : glibc-2.39
==============================
PyTorch Info
==============================
PyTorch version : 2.8.0+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] (64-bit runtime)
Python platform : Linux-5.15.0-151-generic-x86_64-with-glibc2.39
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.9.41
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration 👍
GPU 0: NVIDIA H200
GPU 1: NVIDIA H200
GPU 2: NVIDIA H200
GPU 3: NVIDIA H200
GPU 4: NVIDIA H200
GPU 5: NVIDIA H200
GPU 6: NVIDIA H200
GPU 7: NVIDIA H200
Nvidia driver version : 550.127.08
cuDNN version : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.10.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.10.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.10.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.10.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.10.1
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.10.1
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.10.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.10.1
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
============================== 16:25:18 [128/1908]Versions of relevant libraries
==============================
[pip3] mypy_extensions==1.1.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.11.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3 [pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-dali-cuda120==1.49.0
[pip3] nvidia-ml-py==12.570.86
[pip3] nvidia-modelopt==0.27.1
[pip3] nvidia-modelopt-core==0.27.1
[pip3] nvidia-nccl-cu12==2.27.3
[pip3] nvidia-nvcomp-cu12==4.2.0.14
[pip3] nvidia-nvimgcodec-cu12==0.5.0.13
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvjpeg-cu12==12.4.0.16
[pip3] nvidia-nvjpeg2k-cu12==0.8.1.40
[pip3] nvidia-nvtiff-cu12==0.5.0.67
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] nvidia-resiliency-ext==0.3.0
[pip3] onnx==1.17.0
[pip3] optree==0.15.0
[pip3] pynvml==12.0.0
[pip3] pytorch-triton==3.3.0+git96316ce52.nvinternal
[pip3] pyzmq==26.4.0
[pip3] torch==2.8.0
[pip3] torch_tensorrt==2.8.0a0
[pip3] torchaudio==2.8.0
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.23.0
[pip3] transformers==4.57.1
[pip3] triton==3.4.0
[conda] Could not collect
==============================
vLLM Info
==============================
ROCM Version : Could not collect
vLLM Version : 0.11.0
vLLM Build Flags:
CUDA Archs: 7.5 8.0 8.6 9.0 10.0 12.0+PTX; ROCm: Disabled GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 C
PU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX NODE NODE NODE SYS SYS SYS SYS SYS SYS 0
-47,96-143 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE PIX NODE NODE SYS SYS SYS SYS SYS SYS 0
-47,96-143 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE NODE PIX NODE SYS SYS SYS SYS SYS SYS 0
-47,96-143 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE NODE NODE PIX SYS SYS SYS SYS SYS SYS 0
-47,96-143 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS PIX NODE NODE NODE NODE NODE 4
8-95,144-191 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS NODE PIX NODE NODE NODE NODE 4
8-95,144-191 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS NODE NODE PIX NODE NODE NODE 4
8-95,144-191 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS NODE NODE NODE NODE NODE PIX 4
8-95,144-191 1 N/A
NIC0 PIX NODE NODE NODE SYS SYS SYS SYS X NODE NODE NODE SYS SYS SYS SYS SYS SYS
NIC1 NODE PIX NODE NODE SYS SYS SYS SYS NODE X NODE NODE SYS SYS SYS SYS SYS SYS
NIC2 NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE X NODE SYS SYS SYS SYS SYS SYS
NIC3 NODE NODE NODE PIX SYS SYS SYS SYS NODE NODE NODE X SYS SYS SYS SYS SYS SYS
NIC4 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS SYS SYS X NODE NODE NODE NODE NODE
NIC5 SYS SYS SYS SYS NODE PIX NODE NODE SYS SYS SYS SYS NODE X NODE NODE NODE NODE
NIC6 SYS SYS SYS SYS NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE X NODE NODE NODE
NIC7 SYS SYS SYS SYS NODE NODE NODE NODE SYS SYS SYS SYS NODE NODE NODE X PXB NODE
NIC8 SYS SYS SYS SYS NODE NODE NODE NODE SYS SYS SYS SYS NODE NODE NODE PXB X NODE
NIC9 SYS SYS SYS SYS NODE NODE NODE PIX SYS SYS SYS SYS NODE NODE NODE NODE NODE X
Legend:
🐛 Describe the bug
pip install vllm --extra-index-url https://wheels.vllm.ai/nightly
pip install https://wheels.vllm.ai/dsv32/deep_gemm-2.1.0%2B594953a-cp312-cp312-linux_x86_64.whl
server:
vllm serve deepseek-ai/DeepSeek-V3.2-Exp --served-model-name DeepSeek-V3.2-Exp --trust-remote-code --data-parallel-size 8 --enable-expert-parallel --enable-auto-tool-choice --tool-call-parser deepseek_v31 --chat-template ./deepseek_v31.jinjia
Follow the code here,#23454
#23454
Test Script (Non-Streaming):
Result
root@pytorch-job-20251015152714476386g2zcyrl9qvtf:/# python3 test_tool_calls.py
Traceback (most recent call last):
File "//test_tool_calls.py", line 65, in <module>
response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_utils/_utils.py", line 286, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/resources/chat/completions/completions.py", line 1156, in create
return self._post(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1259, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1047, in request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'message': 'EngineCore encountered an issue. See stack trace (above) for the root cause.', 'type': 'Internal Server Error', 'param': None, 'code': 500}}
....
(APIServer pid=538) WARNING 10-15 16:18:23 [protocol.py:93] The following fields were present in the request but ignored: {'strict'}
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] EngineCore encountered a fatal error.
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] Traceback (most recent call last):
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 701, in run_engine_core
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] engine_core.run_busy_loop()
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1056, in run_busy_loop
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] self.execute_dummy_batch()
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 387, in execute_dummy_batch
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] self.model_executor.execute_dummy_batch()
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in
execute_dummy_batch
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] self.collective_rpc("execute_dummy_batch")
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] return func(*args, **kwargs)
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 491, in
execute_dummy_batch
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] self.model_runner._dummy_run(1, uniform_decode=True)
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] return func(*args, **kwargs)
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3180, in _dummy_run
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] return hidden_states, hidden_states[logit_indices]
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] ~~~~~~~~~~~~~^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP5 pid=826) ERROR 10-15 16:18:24 [core.py:710]
(EngineCore_DP5 pid=826) Process EngineCore_DP5:
(EngineCore_DP5 pid=826) Traceback (most recent call last): (EngineCore_DP5 pid=826) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP5 pid=826) self.run()
(EngineCore_DP5 pid=826) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore_DP5 pid=826) self._target(*self._args, **self._kwargs)
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP5 pid=826) raise e (EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 701, in run_engine_core
(EngineCore_DP5 pid=826) engine_core.run_busy_loop()
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1056, in run_busy_loop
(EngineCore_DP5 pid=826) self.execute_dummy_batch()
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 387, in execute_dummy_batch
(EngineCore_DP5 pid=826) self.model_executor.execute_dummy_batch() (EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in execute_dummy_batch
(EngineCore_DP5 pid=826) self.collective_rpc("execute_dummy_batch")
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP5 pid=826) return [run_method(self.driver_worker, method, args, kwargs)] (EngineCore_DP5 pid=826) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP5 pid=826) return func(*args, **kwargs)
(EngineCore_DP5 pid=826) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 491, in execute_dummy_batch
(EngineCore_DP5 pid=826) self.model_runner._dummy_run(1, uniform_decode=True)
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context (EngineCore_DP5 pid=826) return func(*args, **kwargs)
(EngineCore_DP5 pid=826) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3180, in _dummy_run
(EngineCore_DP5 pid=826) return hidden_states, hidden_states[logit_indices] (EngineCore_DP5 pid=826) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^
(EngineCore_DP5 pid=826) torch.AcceleratorError: CUDA error: an illegal memory access was encountered
(EngineCore_DP5 pid=826) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP5 pid=826) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP5 pid=826) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. (EngineCore_DP5 pid=826)
(APIServer pid=538) ERROR 10-15 16:18:24 [async_llm.py:480] AsyncLLM output_handler failed.
....
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working