Skip to content

Conversation

@benchislett
Copy link
Collaborator

@benchislett benchislett commented Nov 3, 2025

Purpose

GPT-OSS can escape structured outputs by emitting tokens between <|channel|>final and <|message|>, for example: <|start|>assistant<|channel|>final <|constrain|>JSON<|message|>content....

This PR updates the parser to look for <|start|>assistant<|channel|>final and then the suffix <|message|>. This doesn't cover all cases, such as tool calling (see #23120), but it does workaround a few common issues.

Test Plan

Added a unit test for GptOssReasoningParser to target is_reasoning_end specifically. Future work can extend this to cover more cases as needed.

Test Result

Tests now passing.

I also ran the benchmark script:

python3 ../benchmarks/benchmark_serving_structured_output.py --backend openai-chat --model openai/gpt-oss-120b --dataset xgrammar_bench --structured-output-ratio 1.0 --request-rate 100 --num-prompts 100 --port 8046 --endpoint /openai/v1/chat/completions --output-len 900 --save-results

and see 98% accuracy before (same as with ratio set to 0.0. At least one confirmed failure case due to this feature gap), and 100% accuracy after this change.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the GptOss reasoning parser, which previously struggled to correctly identify the end of a reasoning block when extra tokens were present between the final channel and the message tag. The implemented fix introduces a more flexible parsing mechanism that searches for a prefix and suffix separately within a defined token distance, which effectively resolves the issue and enhances parsing accuracy. The inclusion of a comprehensive unit test suite for is_reasoning_end is a valuable addition, as it thoroughly validates the new logic across different scenarios.

I have one suggestion to enhance the performance of the is_reasoning_end method, which could become a performance bottleneck with long input sequences.

@benchislett benchislett added the bug Something isn't working label Nov 3, 2025
# The model can output some special tokens between "final" and "<|message|>"
# So we need to look for both sequences to determine the end of reasoning.
self.reasoning_end_token_ids_prefix = self.model_tokenizer.encode(
"<|start|>assistant<|channel|>final"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the model will only output <|start|>assistant if it does decide to output a reasoning message, in some situations it may just jump straight to the final message, so it may be more reasonable to just do <|channel|>final instead as it will cover both situations.

To cover tool calling scenarios, as soon as you see to= in the header, then the message is a tool call and no longer reasoning.

Both situations can be covered by the ending suffix of <|message|> though.

Copy link
Collaborator Author

@benchislett benchislett Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to cover the first case. I think it would be easiest for me to leave any tool calling updates to follow-up work, as I have no context as to how we do them in vLLM. I'm not entirely sure if even ever uses this code pathway since the work using structured tags

Signed-off-by: Benjamin Chislett <[email protected]>
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 5, 2025
Copy link
Contributor

@alecsolder alecsolder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I have a PR ready soon as a follow up which will adjust this for tool calling as well

@yeqcharlotte yeqcharlotte enabled auto-merge (squash) November 6, 2025 07:47
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025
@yeqcharlotte yeqcharlotte merged commit 1890321 into vllm-project:main Nov 7, 2025
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants