[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer #14094

gcalmettes · 2025-03-02T12:02:59Z

The MistralTokenizer does not currently support the skip_special_tokens=False argument.

Because an assert is used to check for the skip_special_tokens value, if one makes a requests passing the "skip_special_tokens": false parameter in the payload, the assert will be raised, triggering the shutdown of the vllm engine.

We should probably not crash the full engine due to a single client's argument.
This PR validates the request when the MistralTokenizer is used to ensure that the skip_special_tokens parameter is not set to False in the request, otherwise an error is sent back to the client.

cc: @patrickvonplaten

Reproduce the error:

start the vllm latest version

docker run --rm -d --gpus all --shm-size=1G --ulimit memlock=-1 --ulimit stack=67108864 vllm/vllm-openai:v0.7.2 --served-model-name llama  --model=mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer-mode mistral --served-model-name mistral-small

make a valid openAI query with the skip_special_tokens parameter to false

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-used",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 1+1?."},
]

response = client.chat.completions.create(
    messages=messages,
    model="mistral-small",
    extra_body={
        "skip_special_tokens": False
    }
)

Which leads to a shutdown of the server:

INFO 03-02 03:51:06 engine.py:275] Added request chatcmpl-63d252b594554967af670dd479402073.
CRITICAL 03-02 03:51:06 launcher.py:101] MQLLMEngine is already dead, terminating server process
INFO:     172.17.0.1:54396 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 03-02 03:51:06 engine.py:139] AssertionError('skip_special_tokens=False is not supported for Mistral tokenizers.')
ERROR 03-02 03:51:06 engine.py:139] Traceback (most recent call last):
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 137, in start
ERROR 03-02 03:51:06 engine.py:139]     self.run_engine_loop()
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 200, in run_engine_loop
ERROR 03-02 03:51:06 engine.py:139]     request_outputs = self.engine_step()
ERROR 03-02 03:51:06 engine.py:139]                       ^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 218, in engine_step
ERROR 03-02 03:51:06 engine.py:139]     raise e
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 209, in engine_step
ERROR 03-02 03:51:06 engine.py:139]     return self.engine.step()
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 1386, in step
ERROR 03-02 03:51:06 engine.py:139]     outputs = self.model_executor.execute_model(
ERROR 03-02 03:51:06 engine.py:139]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 138, in execute_model
ERROR 03-02 03:51:06 engine.py:139]     output = self.collective_rpc("execute_model",
ERROR 03-02 03:51:06 engine.py:139]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc
ERROR 03-02 03:51:06 engine.py:139]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-02 03:51:06 engine.py:139]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
ERROR 03-02 03:51:06 engine.py:139]     return func(*args, **kwargs)
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 413, in execute_model
ERROR 03-02 03:51:06 engine.py:139]     output = self.model_runner.execute_model(
ERROR 03-02 03:51:06 engine.py:139]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 03-02 03:51:06 engine.py:139]     return func(*args, **kwargs)
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1772, in execute_model
ERROR 03-02 03:51:06 engine.py:139]     model_input.async_callback()
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 1149, in weak_bound
ERROR 03-02 03:51:06 engine.py:139]     unbound(inst, *args, **kwargs)
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 1107, in _process_model_outputs
ERROR 03-02 03:51:06 engine.py:139]     self.output_processor.process_outputs(
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/output_processor/single_step.py", line 97, in process_outputs
ERROR 03-02 03:51:06 engine.py:139]     return self._process_sequence_group_outputs(sequence_group, outputs[0],
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/output_processor/single_step.py", line 124, in _process_sequence_group_outputs
ERROR 03-02 03:51:06 engine.py:139]     new_char_count = self.detokenizer.decode_sequence_inplace(
ERROR 03-02 03:51:06 engine.py:139]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/detokenizer.py", line 119, in decode_sequence_inplace
ERROR 03-02 03:51:06 engine.py:139]     seq.read_offset) = convert_prompt_ids_to_tokens(
ERROR 03-02 03:51:06 engine.py:139]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/detokenizer_utils.py", line 66, in convert_prompt_ids_to_tokens
ERROR 03-02 03:51:06 engine.py:139]     new_tokens = tokenizer.convert_ids_to_tokens(
ERROR 03-02 03:51:06 engine.py:139]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizers/mistral.py", line 378, in convert_ids_to_tokens
ERROR 03-02 03:51:06 engine.py:139]     skip_special_tokens
ERROR 03-02 03:51:06 engine.py:139] AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers.
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

github-actions · 2025-03-02T12:03:09Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill

LGTM, would be great if @patrickvonplaten could check too.

The alternative I guess is to add request validation for this (e.g. using something like a supports_skip_special_tokens() method on the tokenizers), to fail individual requests.

gcalmettes · 2025-04-09T15:38:08Z

@njhill would you prefer the supports_skip_special_tokens() approach ? I can definitely go that route if needed.

Signed-off-by: Guillaume Calmettes <[email protected]>

gcalmettes · 2025-04-09T16:26:13Z

@njhill I actually made the change, as it results in a better user experience: the individual requests are failed and the user received a proper error (we already catch the ValueError in the preprocessing step) explicitely saying that skip_special_tokens=False is not supported with Mistral Tokenizer. That way we also do not change a parameter without the user knowing.

Since we already perform some custom actions in the input preprocessing steps when the MistralTokenizer is used (tool serialization, tool call id truncation), I just added another method to validate the request params.

gcalmettes · 2025-04-09T16:35:11Z

cc @DarkLight1337 , another mistral-related fix. Note that this assert would not be catched by #16344 as it happens in the decode phase

DarkLight1337

This seems reasonable to me as well!

gcalmettes · 2025-04-09T20:37:31Z

CI tests errors (unrelated)

timeout on one of the Entrypoints test:

FAILED entrypoints/openai/test_basic.py::test_request_cancellation[default-frontend-multiprocessing] - openai.APITimeoutError: Request timed out.

connection error on one of the V1 test:

FAILED entrypoints/openai/correctness/test_lmeval.py::test_lm_eval_accuracy_v1_engine - aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:45497 ssl:default [Connect call failed ('127.0.0.1', 45497)]

…ralTokenizer (vllm-project#14094) Signed-off-by: Guillaume Calmettes <[email protected]> Signed-off-by: Yang Wang <[email protected]>

…ralTokenizer (vllm-project#14094) Signed-off-by: Guillaume Calmettes <[email protected]>

…ralTokenizer (vllm-project#14094) Signed-off-by: Guillaume Calmettes <[email protected]> Signed-off-by: Mu Huai <[email protected]>

gcalmettes force-pushed the fix/mistral-tokenizer-assert-skip-special-tokens branch from 40442a9 to 771b2ea Compare March 3, 2025 16:17

njhill approved these changes Mar 7, 2025

View reviewed changes

njhill changed the title ~~[Bugfix]: do not shutdown server is skip_special_use=False for MistralTokenizewr~~ [Bugfix]: do not shutdown server is skip_special_use=False for MistralTokenizer Mar 7, 2025

gcalmettes changed the title ~~[Bugfix]: do not shutdown server is skip_special_use=False for MistralTokenizer~~ [Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer Apr 9, 2025

feat: valide request params for MistralTokenizer process

e6cf7a9

Signed-off-by: Guillaume Calmettes <[email protected]>

gcalmettes force-pushed the fix/mistral-tokenizer-assert-skip-special-tokens branch from d3295ab to e6cf7a9 Compare April 9, 2025 16:25

mergify bot added the frontend label Apr 9, 2025

DarkLight1337 approved these changes Apr 9, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) April 9, 2025 16:37

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2025

vllm-bot merged commit 1da6a09 into vllm-project:main Apr 10, 2025
45 of 50 checks passed

chaunceyjiang mentioned this pull request Apr 22, 2025

[Bug]: AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers #16958

Closed

1 task

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[Bugfix]: do not shutdown server if skip_special_use=False for Mist…

9cd528c

…ralTokenizer (vllm-project#14094) Signed-off-by: Guillaume Calmettes <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Bugfix]: do not shutdown server if skip_special_use=False for Mist…

53291d0

…ralTokenizer (vllm-project#14094) Signed-off-by: Guillaume Calmettes <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer #14094

[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer #14094

Uh oh!

gcalmettes commented Mar 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 2, 2025

Uh oh!

njhill left a comment

Uh oh!

gcalmettes commented Apr 9, 2025

Uh oh!

gcalmettes commented Apr 9, 2025 •

edited

Loading

Uh oh!

gcalmettes commented Apr 9, 2025 •

edited

Loading

Uh oh!

DarkLight1337 left a comment

Uh oh!

gcalmettes commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer #14094

[Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer #14094

Uh oh!

Conversation

gcalmettes commented Mar 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproduce the error:

Uh oh!

github-actions bot commented Mar 2, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

gcalmettes commented Apr 9, 2025

Uh oh!

gcalmettes commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gcalmettes commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

gcalmettes commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer #14094

[Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer #14094

gcalmettes commented Mar 2, 2025 •

edited by github-actions bot

Loading

gcalmettes commented Apr 9, 2025 •

edited

Loading

gcalmettes commented Apr 9, 2025 •

edited

Loading