-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Bugfix] Fix and add tests for GptOss reasoning parser #28000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix and add tests for GptOss reasoning parser #28000
Conversation
Signed-off-by: Benjamin Chislett <[email protected]>
Signed-off-by: Benjamin Chislett <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the GptOss reasoning parser, which previously struggled to correctly identify the end of a reasoning block when extra tokens were present between the final channel and the message tag. The implemented fix introduces a more flexible parsing mechanism that searches for a prefix and suffix separately within a defined token distance, which effectively resolves the issue and enhances parsing accuracy. The inclusion of a comprehensive unit test suite for is_reasoning_end is a valuable addition, as it thoroughly validates the new logic across different scenarios.
I have one suggestion to enhance the performance of the is_reasoning_end method, which could become a performance bottleneck with long input sequences.
| # The model can output some special tokens between "final" and "<|message|>" | ||
| # So we need to look for both sequences to determine the end of reasoning. | ||
| self.reasoning_end_token_ids_prefix = self.model_tokenizer.encode( | ||
| "<|start|>assistant<|channel|>final" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the model will only output <|start|>assistant if it does decide to output a reasoning message, in some situations it may just jump straight to the final message, so it may be more reasonable to just do <|channel|>final instead as it will cover both situations.
To cover tool calling scenarios, as soon as you see to= in the header, then the message is a tool call and no longer reasoning.
Both situations can be covered by the ending suffix of <|message|> though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to cover the first case. I think it would be easiest for me to leave any tool calling updates to follow-up work, as I have no context as to how we do them in vLLM. I'm not entirely sure if even ever uses this code pathway since the work using structured tags
Signed-off-by: Benjamin Chislett <[email protected]>
alecsolder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I have a PR ready soon as a follow up which will adjust this for tool calling as well
Purpose
GPT-OSS can escape structured outputs by emitting tokens between
<|channel|>finaland<|message|>, for example:<|start|>assistant<|channel|>final <|constrain|>JSON<|message|>content....This PR updates the parser to look for
<|start|>assistant<|channel|>finaland then the suffix<|message|>. This doesn't cover all cases, such as tool calling (see #23120), but it does workaround a few common issues.Test Plan
Added a unit test for GptOssReasoningParser to target
is_reasoning_endspecifically. Future work can extend this to cover more cases as needed.Test Result
Tests now passing.
I also ran the benchmark script:
and see 98% accuracy before (same as with ratio set to 0.0. At least one confirmed failure case due to this feature gap), and 100% accuracy after this change.