[Bug]: Major issues with guided generation / structured output in vLLM (up to and including v0.8.1);  many examples provided by vllm in /examples and structured_outputs.html doc do not work

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

I'm following the examples entirely from vllm's example code for structured output found here: https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_structured_outputs.py and getting errors in many examples. Additionally using `response_format` has had many errors which have varied from release v0.6.3-post1 through v0.8.1; I've tested all versions between the two previously mentioned with llama 3.3 and found inconsistencies. The unit tests occurring for guided generation do not seem robust; especially when the examples provided in the examples section don't work when running without modification.

Specifically the issues start with the following code in [openai_chat_completion_structured_outputs.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_structured_outputs.py); all subsequent examples in that .py file are also throwing errors:

```
# Guided decoding by JSON using Pydantic schema
class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"


class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType


json_schema = CarDescription.model_json_schema()

prompt = ("Generate a JSON with the brand, model and car_type of"
          "the most iconic car from the 90's")
completion = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{
        "role": "user",
        "content": prompt,
    }],
    extra_body={"guided_json": json_schema},
)
print(completion.choices[0].message.content)

```

The error I get is:

`BadRequestError: Error code: 400 - {'object': 'error', 'message': 'The provided JSON schema contains features not supported by xgrammer.', 'type': 'BadRequestError', 'param': None, 'code': 400}`


As mentioned previously, I'm unable to run ALL other subsequent examples in /examples/online_serving/openai_chat_completion_structured_outputs.py without receiving some sort of error; those are "Guided decoding by Grammar", "Extra backend options".

I'm also running into many issues using `response format`. For example, when running this specific code from openai:
https://github.com/openai/openai-cookbook/blob/main/examples/Leveraging_model_distillation_to_fine-tune_a_model.ipynb

where response format is defined as:

```
response_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "grape-variety",
        "schema": {
            "type": "object",
            "properties": {
                "variety": {
                    "type": "string",
                    "enum": varieties.tolist()
                }
            },
            "additionalProperties": False,
            "required": ["variety"],
        },
        "strict": True
    }
}
```

Results in the following error on v0.8.1 when running the cell which contains
```
answer = call_model('gpt-4o', generate_prompt(df_france_subset.iloc[0], varieties))
answer
```
(note: gpt-4o was replaced with "meta-llama/Llama-3.3-70B-Instruct")

`BadRequestError: Error code: 400 - {'object': 'error', 'message': 'The provided JSON schema contains features not supported by xgrammar.', 'type': 'BadRequestError', 'param': None, 'code': 400}`

And results in the following - different - error on v0.7.3:

`BadRequestError: Error code: 400 {'object': 'error', 'message': '[{'type': 'extra_forbidden', 'loc': ('body', 'metadata'), 'msg':, 'Extra inputs are not permitted', 'input': {'distillation': 'wine-distillation'}}, {'type': 'extra_forbidden', 'loc': ('body', 'store'), 'msg': 'Extra Inputs are not permitted', 'intput': True}]", 'type: 'BadRequestError, 'param': None, 'code': 400}`


Edit: Also seeing issues with some of the examples found in: https://docs.vllm.ai/en/latest/features/structured_outputs.html

This example, which used to work in v0.7.3, now returns an error:

```
from pydantic import BaseModel
from openai import OpenAI


class Info(BaseModel):
    name: str
    age: int


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
completion = client.beta.chat.completions.parse(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
    ],
    response_format=Info,
    extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
print(message)
assert message.parsed
print("Name:", message.parsed.name)
print("Age:", message.parsed.age)

```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Major issues with guided generation / structured output in vLLM (up to and including v0.8.1); many examples provided by vllm in /examples and structured_outputs.html doc do not work #15236

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Major issues with guided generation / structured output in vLLM (up to and including v0.8.1); many examples provided by vllm in /examples and structured_outputs.html doc do not work #15236

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions