Skip to content

Conversation

gcalmettes
Copy link
Contributor

@gcalmettes gcalmettes commented Apr 9, 2025

When the MistralTokenizer is used, input preprocessing is delegated to mistral-common which uses assert statements to stop the process if the received input is not conformed to what is expected.

Currently AssertionError are not catched when inputs are preprocessed in vLLM and therefore sending an incorrectly formatted message array to a model relying on the MistralTokenizer will result in a 500 Internal Server Error error.

This PR convert AssertionError from the MistralTokenizer to ValueError, so they can be properly caught

# server started with: docker run --rm -d --gpus all --shm-size=1G --ulimit memlock=-1 vllm/vllm-openai:v0.8.3 --served-model-name mistral-small  --model=mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer-mode mistral --load-format mistral --config-format mistral

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8003/v1",
    api_key= "API_KEY",
)

response = client.chat.completions.create(
    model="mistral-small",
    messages=[
        {
            "role": "system",
            # system role content does not support the List[str] format like for user role
            "content": [ 
                {"type": "text", "text": "You are an expert at Maths"},
            ],
        },
        {"role": "user", "content": "What is 1+1?"},
    ]
)

print(response.choices[0].message.content)

assertion error result in 500 Internal Server Error:

  File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/utils.py", line 63, in wrapper
    return handler_task.result()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/utils.py", line 85, in wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 476, in create_chat_completion
    generator = await handler.create_chat_completion(request, raw_request)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 181, in create_chat_completion
    ) = await self._preprocess_chat(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 409, in _preprocess_chat
    request_prompt = apply_mistral_chat_template(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 1198, in apply_mistral_chat_template
    return tokenizer.apply_chat_template(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizers/mistral.py", line 372, in apply_chat_template
    encoded = self.mistral.encode_chat_completion(request)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mistral_common/tokens/tokenizers/mistral.py", line 245, in encode_chat_completion
    return self.instruct_tokenizer.encode_instruct(instruct_request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mistral_common/tokens/tokenizers/sentencepiece.py", line 232, in encode_instruct
    new_tokens = self.encode_system_message(msg)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mistral_common/tokens/tokenizers/sentencepiece.py", line 609, in encode_system_message
    assert isinstance(message.content, str), "Message content must be normalized"

@github-actions
Copy link

github-actions bot commented Apr 9, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Apr 9, 2025
@DarkLight1337
Copy link
Member

DarkLight1337 commented Apr 9, 2025

Perhaps this is better solved by updating mistral_common? cc @patrickvonplaten

@DarkLight1337
Copy link
Member

IMO assertion errors should not be treated like user errors

@gcalmettes
Copy link
Contributor Author

I agree with you @DarkLight1337 .

However I have still to see an external PR on mistal-common to be merged.
E.g.: this PR regarding the compliance with OpenAI API was made last september (and as you can see I re-ping them on it), and finally we had to make a workaround in vLLM to handle the inconsistency.

@DarkLight1337
Copy link
Member

To avoid mistakenly catching assertion errors inside vLLM, can we add a try-catch block to wrap mistral tokenizer and convert assertion errors to value errors?

@gcalmettes
Copy link
Contributor Author

Yes this is a good idea. I'll update the PR to go in that direction

@gcalmettes gcalmettes marked this pull request as draft April 9, 2025 13:45
@gcalmettes gcalmettes changed the title [Bugfix] add AssertionError to catched errors when preprocessing inputs [Bugfix] catch AssertionError in MistralTokenizer as ValueError Apr 9, 2025
@gcalmettes gcalmettes force-pushed the fix/preprocessing-assertion branch from 3038633 to f18e9f8 Compare April 9, 2025 14:46
@gcalmettes gcalmettes marked this pull request as ready for review April 9, 2025 14:48
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work for now. Please fix pre-commit

@gcalmettes
Copy link
Contributor Author

@DarkLight1337 I made the change you proposed and added a note to explain why we are doing an Error type conversion

Signed-off-by: Guillaume Calmettes <[email protected]>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 9, 2025 15:02
@DarkLight1337 DarkLight1337 disabled auto-merge April 9, 2025 15:03
Signed-off-by: Guillaume Calmettes <[email protected]>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 9, 2025 15:05
@DarkLight1337 DarkLight1337 merged commit c3b5189 into vllm-project:main Apr 9, 2025
43 checks passed
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants