-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Bugfix] Mistral tool parser streaming update #19425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Bugfix] Mistral tool parser streaming update #19425
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @avigny, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses an issue related to streaming tool calls for Mistral models by replacing the previous partial_json_parser-based implementation with a more robust, custom stateful parser. This new approach aims to accurately extract tool call information, including names and arguments, as tokens are streamed, improving the reliability of tool use functionality in streaming mode. The changes include a significant rewrite of the parsing logic and the addition of comprehensive test coverage.
Highlights
- Refactor Streaming Parser: The core logic for parsing Mistral tool calls during streaming has been completely rewritten.
- Removed Dependency: The dependency on the
partial_json_parserlibrary for streaming tool call extraction has been removed. - New Parsing Mechanism: Introduced a custom stateful parsing mechanism using regex and
json.JSONDecoder.raw_decodeto incrementally extract tool call names and arguments from the raw token stream. - Comprehensive Tests: Added a new, extensive test file (
tests/tool_use/test_mistral_tool_parser.py) with various test cases covering both streaming and non-streaming scenarios for Mistral tool calls, including single and multiple tool calls with different argument structures.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the streaming tool call parsing logic for Mistral models and adds a comprehensive test suite. The core change involves replacing partial_json_parser with a custom regex and json.raw_decode-based approach for more fine-grained control over the streaming process. The new tests cover a variety of scenarios. The review includes stylistic suggestions for the tests and points for consideration regarding complexity and state management in the new parsing logic.
Tests are similar as the ones added for Jamba models in vllm-project#9154 Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
c468495 to
d6d17c1
Compare
|
@hibukipanim I did run the test you provided in your issue description #17585 (comment) and got the following output: ChoiceDeltaToolCall(index=0, id='j6OY9szTS', function=ChoiceDeltaToolCallFunction(arguments=None, name='mcp_confluence'), type='function')
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='{"', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='query', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='co', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='ffee', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='",', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='limit', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' ', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='1', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='}', name=None), type=None)It seems to fix your issue. |
|
@avigny hey! I've being trying to test your solution but with no success. This is what I'm doing: Where template.jinja is this one and mistral_tool_parser.py is the one that you've created. I'm using this test request: When I set stream to false, I'm getting this response: And this error: When I set stream=true, I dont receive any errors, but the response does not have tool calls: Am I doing something wrong here? |
|
Looks liks this PR unfortunately don't fix issues on Mistral Small 3.2. API Call : {
"stream": false,
"temperature": 0.15,
"top_p": 1.0,
"tool_choice": "auto",
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"messages": [
{
"role": "user",
"content": "Hi ! What's the result of 95478415 / 4571 ?"
}
],
"tools": [
{
"type":"function",
"function": {
"name":"calculator",
"description":"Perform a basic calculation using ruby syntax for arithmetic operations.",
"parameters": {
"type":"object",
"properties": {
"calculation": {
"type":"string",
"description":"A basic arithmetic calculation in python language (e.g., \"2+2\", \"10*3\", \"45/9\").",
"required":["calculation"]
}
},
"required":["calculation"]
}
}
}
]
}Still have this error : Here are some logs : === model_output ===
[TOOL_CALLS]calculator{"calculation": "95478415 / 4571"}
=== tool_content ===
calculator{"calculation": "95478415 / 4571"}Please note that this issue is NOT happening when using |
|
Yes you're both right! Thanks for finding this! |
|
Any update on getting this merge? |
|
cc @aarnphm |
|
So I did more complete testing and found this wasn't working that well after all -- I was getting the same errors reported above. Not sure what happened on initial testing. But, I've since taken it and have a working implementation, for streaming at least, at https://github.com/sjuxax/vllm/tree/Mistral3.2-tool-call-fix. I'm going to cherry-pick it onto #20471 in a sec. Then using that branch should work with quantized HF models and tool calling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think replacing lines 127:139 with this below will fix it for non-streaming:
#First, use the tool call token to split, and we discard the first item, because it is empty
raw_tool_calls = model_output.split(self.bot_token)[1:]
function_call_arr = []
for raw_tool_call in raw_tool_calls:
tool_name = raw_tool_call.split("{")[0]
tool_arguments_begin = raw_tool_call.find("{")
tool_arguments = raw_tool_call[tool_arguments_begin:]
function_call_arr.append({
"name": tool_name,
"arguments": json.loads(tool_arguments)
})
Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
… tokens Signed-off-by: avigny <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
…-streaming-update # Conflicts: # vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py
same as vllm-project#26633 Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
|
About the errors during the streaming parsing I've added a try catch to be like other parsers. vllm/vllm/entrypoints/openai/tool_parsers/llama4_pythonic_tool_parser.py Lines 211 to 216 in 5afd327
vllm/vllm/entrypoints/openai/tool_parsers/deepseekv31_tool_parser.py Lines 390 to 392 in 5afd327
|
|
@avigny I didn't review all of the commits from scratch again, but the latest commits checking just for the bot_token_id (vs text) and error handling look good to me. Thanks for getting this in good shape!! |
…ct-2506`" This reverts commit 58512a6. Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
…om chat template Signed-off-by: avigny <[email protected]>
Signed-off-by: avigny <[email protected]>
ed1a188 to
47ce613
Compare
|
@avigny I can see you are putting in a lot of work getting mistral tool calling to work! Thanks for all your effort! Can you update what the current status is? |
|
cc @chaunceyjiang can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
Large model test is no longer being added
|
Looking forwarding for this merge! |
|
If this gets merged and incorporated in a new release of the docker vLLM image by early next week, y'all are for real goated. That would be perfect timing for me Happy Halloween! |
Purpose
Fixes #13622
Fixes #17585
Fixes #20028
This PR is similar to #16096 (hermes tool parser)
In summary
Repairs tool call in streaming mode for (older) models with tokenizer version <v11
The model output is incrementaly parsed with
ijsonemitting events used to know what is being streamed (what part of the tool call). for more details see_extract_tool_calls_streaming_pre_v11_tokenizerQuick unit tests added in
tests/tool_use/test_mistral_tool_parser.pyseetest_extract_tool_calls_streaming_pre_v11_tokenizerAdds support for tool calls in streaming mode for recent models (tokenizer version >=v11)
See
_extract_tool_calls_streamingfor implementation detailsTest added for
mistralai/Mistral-Small-3.2-24B-Instruct-2506intests/tool_use/test_mistral_tool_parser.pyQuick unit tests added in
tests/tool_use/test_mistral_tool_parser.pyseetest_extract_tool_calls_streamingTest Plan
I've added a test file
tests/tool_use/test_mistral_tool_parser.pyfor easy and fast testing. This file works similarly as the existingtests/tool_use/test_jamba_tool_parser.py.This tests the parsing functions with a mocked model output. It allows toeasily test edge cases.
Use
pytest tests/tool_use/test_mistral_tool_parser.pyto run this test file.Test added for
mistralai/Mistral-Small-3.2-24B-Instruct-2506intests/tool_use/test_mistral_tool_parser.py(Optional) Documentation Update
I believe no documentation update is needed