Skip to content

[Bug]: Major issues with guided generation / structured output in vLLM (up to and including v0.8.1); many examples provided by vllm in /examples and structured_outputs.html doc do not work #15236

@agm-eratosth

Description

@agm-eratosth

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

I'm following the examples entirely from vllm's example code for structured output found here: https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_structured_outputs.py and getting errors in many examples. Additionally using response_format has had many errors which have varied from release v0.6.3-post1 through v0.8.1; I've tested all versions between the two previously mentioned with llama 3.3 and found inconsistencies. The unit tests occurring for guided generation do not seem robust; especially when the examples provided in the examples section don't work when running without modification.

Specifically the issues start with the following code in openai_chat_completion_structured_outputs.py; all subsequent examples in that .py file are also throwing errors:

# Guided decoding by JSON using Pydantic schema
class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"


class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType


json_schema = CarDescription.model_json_schema()

prompt = ("Generate a JSON with the brand, model and car_type of"
          "the most iconic car from the 90's")
completion = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{
        "role": "user",
        "content": prompt,
    }],
    extra_body={"guided_json": json_schema},
)
print(completion.choices[0].message.content)

The error I get is:

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'The provided JSON schema contains features not supported by xgrammer.', 'type': 'BadRequestError', 'param': None, 'code': 400}

As mentioned previously, I'm unable to run ALL other subsequent examples in /examples/online_serving/openai_chat_completion_structured_outputs.py without receiving some sort of error; those are "Guided decoding by Grammar", "Extra backend options".

I'm also running into many issues using response format. For example, when running this specific code from openai:
https://github.com/openai/openai-cookbook/blob/main/examples/Leveraging_model_distillation_to_fine-tune_a_model.ipynb

where response format is defined as:

response_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "grape-variety",
        "schema": {
            "type": "object",
            "properties": {
                "variety": {
                    "type": "string",
                    "enum": varieties.tolist()
                }
            },
            "additionalProperties": False,
            "required": ["variety"],
        },
        "strict": True
    }
}

Results in the following error on v0.8.1 when running the cell which contains

answer = call_model('gpt-4o', generate_prompt(df_france_subset.iloc[0], varieties))
answer

(note: gpt-4o was replaced with "meta-llama/Llama-3.3-70B-Instruct")

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'The provided JSON schema contains features not supported by xgrammar.', 'type': 'BadRequestError', 'param': None, 'code': 400}

And results in the following - different - error on v0.7.3:

BadRequestError: Error code: 400 {'object': 'error', 'message': '[{'type': 'extra_forbidden', 'loc': ('body', 'metadata'), 'msg':, 'Extra inputs are not permitted', 'input': {'distillation': 'wine-distillation'}}, {'type': 'extra_forbidden', 'loc': ('body', 'store'), 'msg': 'Extra Inputs are not permitted', 'intput': True}]", 'type: 'BadRequestError, 'param': None, 'code': 400}

Edit: Also seeing issues with some of the examples found in: https://docs.vllm.ai/en/latest/features/structured_outputs.html

This example, which used to work in v0.7.3, now returns an error:

from pydantic import BaseModel
from openai import OpenAI


class Info(BaseModel):
    name: str
    age: int


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
completion = client.beta.chat.completions.parse(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
    ],
    response_format=Info,
    extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions