Skip to content

Conversation

gcalmettes
Copy link
Contributor

@gcalmettes gcalmettes commented Mar 2, 2025

The MistralTokenizer does not currently support the skip_special_tokens=False argument.

Because an assert is used to check for the skip_special_tokens value, if one makes a requests passing the "skip_special_tokens": false parameter in the payload, the assert will be raised, triggering the shutdown of the vllm engine.

We should probably not crash the full engine due to a single client's argument.
This PR validates the request when the MistralTokenizer is used to ensure that the skip_special_tokens parameter is not set to False in the request, otherwise an error is sent back to the client.

cc: @patrickvonplaten

Reproduce the error:

  1. start the vllm latest version
docker run --rm -d --gpus all --shm-size=1G --ulimit memlock=-1 --ulimit stack=67108864 vllm/vllm-openai:v0.7.2 --served-model-name llama  --model=mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer-mode mistral --served-model-name mistral-small
  1. make a valid openAI query with the skip_special_tokens parameter to false
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-used",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 1+1?."},
]

response = client.chat.completions.create(
    messages=messages,
    model="mistral-small",
    extra_body={
        "skip_special_tokens": False
    }
)

Which leads to a shutdown of the server:

INFO 03-02 03:51:06 engine.py:275] Added request chatcmpl-63d252b594554967af670dd479402073.
CRITICAL 03-02 03:51:06 launcher.py:101] MQLLMEngine is already dead, terminating server process
INFO:     172.17.0.1:54396 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 03-02 03:51:06 engine.py:139] AssertionError('skip_special_tokens=False is not supported for Mistral tokenizers.')
ERROR 03-02 03:51:06 engine.py:139] Traceback (most recent call last):
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 137, in start
ERROR 03-02 03:51:06 engine.py:139]     self.run_engine_loop()
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 200, in run_engine_loop
ERROR 03-02 03:51:06 engine.py:139]     request_outputs = self.engine_step()
ERROR 03-02 03:51:06 engine.py:139]                       ^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 218, in engine_step
ERROR 03-02 03:51:06 engine.py:139]     raise e
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 209, in engine_step
ERROR 03-02 03:51:06 engine.py:139]     return self.engine.step()
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 1386, in step
ERROR 03-02 03:51:06 engine.py:139]     outputs = self.model_executor.execute_model(
ERROR 03-02 03:51:06 engine.py:139]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 138, in execute_model
ERROR 03-02 03:51:06 engine.py:139]     output = self.collective_rpc("execute_model",
ERROR 03-02 03:51:06 engine.py:139]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc
ERROR 03-02 03:51:06 engine.py:139]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-02 03:51:06 engine.py:139]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
ERROR 03-02 03:51:06 engine.py:139]     return func(*args, **kwargs)
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 413, in execute_model
ERROR 03-02 03:51:06 engine.py:139]     output = self.model_runner.execute_model(
ERROR 03-02 03:51:06 engine.py:139]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 03-02 03:51:06 engine.py:139]     return func(*args, **kwargs)
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1772, in execute_model
ERROR 03-02 03:51:06 engine.py:139]     model_input.async_callback()
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 1149, in weak_bound
ERROR 03-02 03:51:06 engine.py:139]     unbound(inst, *args, **kwargs)
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 1107, in _process_model_outputs
ERROR 03-02 03:51:06 engine.py:139]     self.output_processor.process_outputs(
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/output_processor/single_step.py", line 97, in process_outputs
ERROR 03-02 03:51:06 engine.py:139]     return self._process_sequence_group_outputs(sequence_group, outputs[0],
ERROR 03-02 03:51:06 engine.py:139]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/output_processor/single_step.py", line 124, in _process_sequence_group_outputs
ERROR 03-02 03:51:06 engine.py:139]     new_char_count = self.detokenizer.decode_sequence_inplace(
ERROR 03-02 03:51:06 engine.py:139]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/detokenizer.py", line 119, in decode_sequence_inplace
ERROR 03-02 03:51:06 engine.py:139]     seq.read_offset) = convert_prompt_ids_to_tokens(
ERROR 03-02 03:51:06 engine.py:139]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/detokenizer_utils.py", line 66, in convert_prompt_ids_to_tokens
ERROR 03-02 03:51:06 engine.py:139]     new_tokens = tokenizer.convert_ids_to_tokens(
ERROR 03-02 03:51:06 engine.py:139]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-02 03:51:06 engine.py:139]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizers/mistral.py", line 378, in convert_ids_to_tokens
ERROR 03-02 03:51:06 engine.py:139]     skip_special_tokens
ERROR 03-02 03:51:06 engine.py:139] AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers.
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

Copy link

github-actions bot commented Mar 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@gcalmettes gcalmettes force-pushed the fix/mistral-tokenizer-assert-skip-special-tokens branch from 40442a9 to 771b2ea Compare March 3, 2025 16:17
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would be great if @patrickvonplaten could check too.

The alternative I guess is to add request validation for this (e.g. using something like a supports_skip_special_tokens() method on the tokenizers), to fail individual requests.

@njhill njhill changed the title [Bugfix]: do not shutdown server is skip_special_use=False for MistralTokenizewr [Bugfix]: do not shutdown server is skip_special_use=False for MistralTokenizer Mar 7, 2025
@gcalmettes gcalmettes changed the title [Bugfix]: do not shutdown server is skip_special_use=False for MistralTokenizer [Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer Apr 9, 2025
@gcalmettes
Copy link
Contributor Author

@njhill would you prefer the supports_skip_special_tokens() approach ? I can definitely go that route if needed.

@gcalmettes gcalmettes force-pushed the fix/mistral-tokenizer-assert-skip-special-tokens branch from d3295ab to e6cf7a9 Compare April 9, 2025 16:25
@gcalmettes
Copy link
Contributor Author

gcalmettes commented Apr 9, 2025

@njhill I actually made the change, as it results in a better user experience: the individual requests are failed and the user received a proper error (we already catch the ValueError in the preprocessing step) explicitely saying that skip_special_tokens=False is not supported with Mistral Tokenizer. That way we also do not change a parameter without the user knowing.

Since we already perform some custom actions in the input preprocessing steps when the MistralTokenizer is used (tool serialization, tool call id truncation), I just added another method to validate the request params.

@mergify mergify bot added the frontend label Apr 9, 2025
@gcalmettes
Copy link
Contributor Author

gcalmettes commented Apr 9, 2025

cc @DarkLight1337 , another mistral-related fix. Note that this assert would not be catched by #16344 as it happens in the decode phase

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me as well!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 9, 2025 16:37
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2025
@gcalmettes
Copy link
Contributor Author

CI tests errors (unrelated)

  • timeout on one of the Entrypoints test:
FAILED entrypoints/openai/test_basic.py::test_request_cancellation[default-frontend-multiprocessing] - openai.APITimeoutError: Request timed out.
  • connection error on one of the V1 test:
FAILED entrypoints/openai/correctness/test_lmeval.py::test_lm_eval_accuracy_v1_engine - aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:45497 ssl:default [Connect call failed ('127.0.0.1', 45497)]

@vllm-bot vllm-bot merged commit 1da6a09 into vllm-project:main Apr 10, 2025
45 of 50 checks passed
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
…ralTokenizer (vllm-project#14094)

Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…ralTokenizer (vllm-project#14094)

Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants