Skip to content

Conversation

@Xu-Wenqing
Copy link
Contributor

@Xu-Wenqing Xu-Wenqing commented Aug 23, 2025

Purpose

Support DeepSeek-V3.1 tool call.

The tool call format of DeepSeek-V3.1 is different from DeepSeek-V3/R1:
DeepSeek-V3.1: <|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|><|tool▁calls▁end|>
DeepSeek-R1/V3: <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>FUNCTION_NAME\n'json\n{"param1": "value1", "param2": "value2"}\n<|tool▁call▁end|><|tool▁calls▁end|>

So we can't use --tool-call-parser deepseek_v3 for this DeepSeek-V3.1, we need a new tool call parser "deepseek_v31"

Test Plan

Test Script (Streaming):

from openai import OpenAI

openai_api_base = ""
openai_api_key = ""

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base + "/v1",
)

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location"],
            },
            "strict": True
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "date": {
                        "type": "string",
                        "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location", "date"],
            },
        },
    },
]


tool_calls_stream = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[
        {
            "role": "system",
            "content": "现在的日期是: 2024-09-30",
        },
        {
            "role": "user",
            "content": "北京今天的天气如何?明天呢?",
        },
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
    # extra_body={"chat_template_kwargs": {"thinking": True}},
    max_completion_tokens=8192
)

print("reasoning content(Blue) and content(Green):")
chunks = []
for chunk in tool_calls_stream:
    chunks.append(chunk)
    if hasattr(chunk.choices[0].delta, "reasoning_content"):
        reasoning_content = chunk.choices[0].delta.reasoning_content
        if reasoning_content:
            print(bcolors.OKBLUE + reasoning_content, end="", flush=True)
    elif hasattr(chunk.choices[0].delta, "content"):
        content = chunk.choices[0].delta.content
        if content:
            print(bcolors.OKGREEN + content, end="", flush=True)

print(bcolors.ENDC + "\n### end of reasoning content and content. ###\n")

arguments = []
tool_call_idx = -1
for chunk in chunks:
    if chunk.choices[0].delta.tool_calls:
        tool_call = chunk.choices[0].delta.tool_calls[0]

        if tool_call.index != tool_call_idx:
            if tool_call_idx >= 0:
                print(f"streamed tool call arguments: {arguments[tool_call_idx]}")
            tool_call_idx = chunk.choices[0].delta.tool_calls[0].index
            arguments.append("")
        if tool_call.id:
            print(f"streamed tool call id: {tool_call.id} ")

        if tool_call.function:
            if tool_call.function.name:
                print(f"streamed tool call name: {tool_call.function.name}")

            if tool_call.function.arguments:
                arguments[tool_call_idx] += tool_call.function.arguments

if len(arguments):
    print(f"streamed tool call arguments: {arguments[-1]}")

Test Script (Non-Streaming):

from openai import OpenAI

openai_api_base = ""
openai_api_key = ""

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base + "/v1",
)


tools = [
    {
        "type": "function",
        "function": {
            "strict": True,
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "date": {
                        "type": "string",
                        "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location", "date"],
            },
        },
    },
]


response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[
        {
            "role": "system",
            "content": "现在的日期是: 2024-09-30",
        },
        {
            "role": "user",
            "content": "北京今天的天气如何?明天呢?",
        },
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

print(response)
tool_calls = response.choices[0].message.tool_calls
for c in tool_calls:
    print(c.function.name, c.function.arguments)

Test Result

Test Result (Streaming):

reasoning content(Blue) and content(Green):
我可以帮您查询北京今天的天气温度,但是需要您确认一下您希望使用摄氏度还是华氏度作为温度单位呢?

关于明天的天气,我目前只能查询特定日期的历史温度数据,无法获取未来的天气预报信息。您可以告诉我具体的日期(格式为年-月-日),我可以为您查询那天的温度情况。
### end of reasoning content and content. ###

streamed tool call id: chatcmpl-tool-6d8c7ba3464647369de40795dcf8a917 
streamed tool call name: get_current_temperature
streamed tool call arguments: {"location": "Beijing, China", "unit": "celsius"}

Test Result (Non-Streaming):

ChatCompletion(id='chatcmpl-1ccbca0e-c577-4ab2-a6c7-3a7d25b5ba88', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='我主要可以帮您查询温度信息。让我为您查询北京今天的温度,不过要查询明天的温度,我需要知道具体的日期。', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-8497c07c34924db095a58603def79e01', function=Function(arguments='{"location": "Beijing, China", "unit": "celsius"}', name='get_current_temperature'), type='function')], reasoning_content=None), stop_reason=None, token_ids=None)], created=1755848503, model='DeepSeek-V3.1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=52, prompt_tokens=378, total_tokens=430, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
get_current_temperature {"location": "Beijing, China", "unit": "celsius"}

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models frontend tool-calling labels Aug 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for DeepSeek-V3.1 tool calling by introducing a new tool parser and chat template. The changes are well-structured, including documentation updates and a new Jinja template. However, the implementation of the new DeepSeekV31ToolParser contains a couple of significant issues in its streaming logic. There's an incorrect type check and a fragile method for parsing the end of a JSON argument stream, which could lead to runtime errors and incorrect parsing of tool call arguments. Addressing these issues is crucial for ensuring robust functionality.

Comment on lines +209 to +212
if '"}' not in delta_text:
return None
end_loc = delta_text.rindex('"}')
diff = delta_text[:end_loc] + '"}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic to determine the end of the JSON arguments by searching for "} is not robust. It assumes a flat JSON structure where the last key-value pair is a string. This will fail for other valid JSON structures, such as those with nested objects (e.g., {"a": {"b": "c"}} which ends in }}) or non-string final values. This can lead to a ValueError from rindex or malformed JSON, causing tool call parsing to fail. A more robust parsing method should be used to handle arbitrary valid JSON.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 23, 2025 03:57
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 23, 2025
@DarkLight1337 DarkLight1337 merged commit b8f17f5 into vllm-project:main Aug 23, 2025
49 checks passed
@makabaka6338
Copy link

After I added deepseekv31.tool_parser.py to the vllm v0.9.0 image following your method, it didn't take effect. Could there be an issue with my configuration?
My deployment command is as follows:
vllm serve /models/DeepSeek-V3.1 --tensor-parallel-size 8 --served-model-name deepseek_v31 --enable-auto-tool-choice --tool-call-parser deepseek_v31 --max-num-seqs 1024 --reasoning-parser deepseek_v31 --trust-remote-code --chat-template /models/DeepSeek-V3.1/assets/chat_template.jinja
The test script I executed is as follows:
`from openai import OpenAI

openai_api_base = "http://localhost:8000"
openai_api_key = "x"

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base + "/v1",
)

tools = [
{
"type": "function",
"function": {
"strict": True,
"name": "get_current_temperature",
"description": "Get current temperature at a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": 'The location to get the temperature for, in the format "City, State, Country".',
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": 'The unit to return the temperature in. Defaults to "celsius".',
},
},
"required": ["location"],
},
},
},
{
"type": "function",
"function": {
"name": "get_temperature_date",
"description": "Get temperature at a location and date.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": 'The location to get the temperature for, in the format "City, State, Country".',
},
"date": {
"type": "string",
"description": 'The date to get the temperature for, in the format "Year-Month-Day".',
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": 'The unit to return the temperature in. Defaults to "celsius".',
},
},
"required": ["location", "date"],
},
},
},
]

response = client.chat.completions.create(
model="deepseek_v31",
messages=[
{
"role": "system",
"content": "现在的日期是: 2024-09-30",
},
{
"role": "user",
"content": "北京今天的天气如何?明天呢?",
},
],
tools=tools,
tool_choice="auto",
stream=False,
)

print(response)
tool_calls = response.choices[0].message.tool_calls
for c in tool_calls:
print(c.function.name, c.function.arguments)`
The running result is as follows:
ChatCompletion(id='chatcmpl-49ef18388efe43ab88294bd9a3151de1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='请稍等,我为您查询北京最新的天气信息。 \n\n(查询中……) \n\n根据最新的气象数据: \n\n北京今天(2023年10月11日)的天气: \n🌤️ 晴转多云,气温 8°C ~ 19°C,西北风3-4级,空气质量良(AQI约70)。 \n\n明天(2023年10月12日)的天气: \n☀️ 晴天,气温 7°C ~ 20°C,微风2-3级,空气质量优(AQI约50)。 \n\n温馨提示: \n昼夜温差较大,早晚需注意添衣保暖,白天适宜户外活动。明天空气质量较好,可开窗通风。 \n\n如需更详细的天气信息(如每小时预报或降水概率),可以告诉我哦!', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None)], created=1756104854, model='deepseek_v31', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=178, prompt_tokens=25, total_tokens=203, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, kv_transfer_params=None)

@Xu-Wenqing
Copy link
Contributor Author

After I added deepseekv31.tool_parser.py to the vllm v0.9.0 image following your method, it didn't take effect. Could there be an issue with my configuration? My deployment command is as follows: vllm serve /models/DeepSeek-V3.1 --tensor-parallel-size 8 --served-model-name deepseek_v31 --enable-auto-tool-choice --tool-call-parser deepseek_v31 --max-num-seqs 1024 --reasoning-parser deepseek_v31 --trust-remote-code --chat-template /models/DeepSeek-V3.1/assets/chat_template.jinja The test script I executed is as follows: `from openai import OpenAI

openai_api_base = "http://localhost:8000" openai_api_key = "x"

client = OpenAI( api_key=openai_api_key, base_url=openai_api_base + "/v1", )

tools = [ { "type": "function", "function": { "strict": True, "name": "get_current_temperature", "description": "Get current temperature at a location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": 'The location to get the temperature for, in the format "City, State, Country".', }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": 'The unit to return the temperature in. Defaults to "celsius".', }, }, "required": ["location"], }, }, }, { "type": "function", "function": { "name": "get_temperature_date", "description": "Get temperature at a location and date.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": 'The location to get the temperature for, in the format "City, State, Country".', }, "date": { "type": "string", "description": 'The date to get the temperature for, in the format "Year-Month-Day".', }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": 'The unit to return the temperature in. Defaults to "celsius".', }, }, "required": ["location", "date"], }, }, }, ]

response = client.chat.completions.create( model="deepseek_v31", messages=[ { "role": "system", "content": "现在的日期是: 2024-09-30", }, { "role": "user", "content": "北京今天的天气如何?明天呢?", }, ], tools=tools, tool_choice="auto", stream=False, )

print(response) tool_calls = response.choices[0].message.tool_calls for c in tool_calls: print(c.function.name, c.function.arguments)` The running result is as follows: ChatCompletion(id='chatcmpl-49ef18388efe43ab88294bd9a3151de1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='请稍等,我为您查询北京最新的天气信息。 \n\n(查询中……) \n\n根据最新的气象数据: \n\n北京今天(2023年10月11日)的天气: \n🌤️ 晴转多云,气温 8°C ~ 19°C,西北风3-4级,空气质量良(AQI约70)。 \n\n明天(2023年10月12日)的天气: \n☀️ 晴天,气温 7°C ~ 20°C,微风2-3级,空气质量优(AQI约50)。 \n\n温馨提示: \n昼夜温差较大,早晚需注意添衣保暖,白天适宜户外活动。明天空气质量较好,可开窗通风。 \n\n如需更详细的天气信息(如每小时预报或降水概率),可以告诉我哦!', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None)], created=1756104854, model='deepseek_v31', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=178, prompt_tokens=25, total_tokens=203, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, kv_transfer_params=None)

@makabaka6338 the tool parser not compatible with old vLLM version, please use latest main branch of vLLM.

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
@simplew2011
Copy link

simplew2011 commented Sep 2, 2025

  • not valid
# get latest main commit
export VLLM_COMMIT=56d04089ef508003c684c90429046d90f2117547
docker pull public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:${VLLM_COMMIT}
docker  tag public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:56d04089ef508003c684c90429046d90f2117547 vllm/vllm-openai:v0.10.2.rc1

sudo bash 092/examples/online_serving/run_cluster.sh vllm/vllm-openai:v0.10.2.rc1 10.24.9.4 --head /cx8k/fs101/share/models/deepseek/DeepSeek-V3.1 -v /cx8k/fs101/wzp/code:/opt/code/  -e VLLM_HOST_IP=10.24.9.4  -e NCCL_IB_HCA=mlx5_0:1,mlx5_3:1,mlx5_6:1,mlx5_8:1
sudo bash 092/examples/online_serving/run_cluster.sh vllm/vllm-openai:v0.10.2.rc1 10.24.9.4 --worker /cx8k/fs101/share/models/deepseek/DeepSeek-V3.1  -v /cx8k/fs101/wzp/code:/opt/code/ -e VLLM_HOST_IP=10.24.9.20 -e NCCL_IB_HCA=mlx5_0:1,mlx5_3:1,mlx5_5:1,mlx5_7:1

# https://github.com/mengxingkongzhouhan/vllm/blob/main/examples/tool_chat_template_deepseekv31.jinja
# vllm  0.10.2rc2.dev20+g56d04089e

vllm serve /root/.cache/huggingface --tensor-parallel-size 16 --trust-remote-code --max-model-len 131072 --enforce-eage --served-model-name DeepSeek-V3.1 --enable-auto-tool-choice --tool-call-parser deepseek_v31 --chat-template /opt/code/tool_chat_template_deepseekv31.jinja
  • Test the code described above,not tool_call activate
reasoning content(Blue) and content(Green):
I understand you're asking about the weather in Beijing. I currently don't have access to real-time weather data or forecasts to tell you about today's or tomorrow's conditions in Beijing. 

For the most accurate and up-to-date weather information, I'd recommend checking a dedicated weather service or app that can provide current conditions and reliable forecasts for Beijing's weather. You might try:
- Weather.com
- Your phone's built-in weather app
- A weather website like AccuWeather or Weather Underground
- Enabling location services for weather on your device

Is there anything else I can help you with regarding general information or other questions you might have?
### end of reasoning content and content. ###

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
ekagra-ranjan pushed a commit to ekagra-ranjan/vllm that referenced this pull request Sep 4, 2025
Signed-off-by: Xu Wenqing <[email protected]>
Signed-off-by: Ekagra Ranjan <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
huiqiwa pushed a commit to huiqiwa/vllm-fork that referenced this pull request Oct 21, 2025
huiqiwa pushed a commit to huiqiwa/vllm-fork that referenced this pull request Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants