Skip to content

Conversation

@alecsolder
Copy link
Contributor

@alecsolder alecsolder commented Nov 5, 2025

Purpose

Building on top of the initial gpt-oss reasoning parser in #25515, this PR fleshes out the full structural tags schema to guide the chat format for gpt-oss.

It is only added to the Responses API path for now, but it should be able to fix issues like the ones mentioned in #24954 without needing to make adjustments to Harmony.

Guided decoding for the gpt-oss chat format should only be enabled if structured_outputs_config.enable_in_reasoning is True, since the chat format is technically in the reasoning section of model output. It also lets us continue to use structured outputs for the content of the final message. structured_outputs_config.reasoning_parser is set by default for gpt-oss.

TODO: test that structured output works alongside this after #28000 is merged, and adjust for tool calling

Test Plan

New unit tests and e2e tests with difficult tool names.

To enable GD of the chat format

vllm serve openai/gpt-oss-20b --structured-outputs-config='{"enable_in_reasoning": true}'

Tested with a curl like

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Look up the weather roughly in san francisco.",
    "enable_response_messages": true,
    "tools": [{
      "type": "function",
      "name": "get_weather",
      "description": "Get current temperature for provided coordinates in celsius.",
      "parameters": {
        "type": "object",
        "properties": {
          "latitude": {"type": "number"},
          "longitude": {"type": "number"}
        },
        "required": ["latitude", "longitude"],
        "additionalProperties": false
      },
      "strict": true
    }]
  }'

And confirming it still works with built in tools

vllm serve openai/gpt-oss-20b --structured-outputs-config='{"enable_in_reasoning": true}' --tool-server=localhost:8081/browser,localhost:8081/python

With a curl like

curl -X POST http://localhost:8000/v1/responses \ 
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "input": "Search for vLLM performance.",
    "tools": [{"type": "web_search_preview"}]
  }'

Test Result

Valid output for both cases


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Alec Solder added 4 commits November 5, 2025 09:13
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the tool-calling capabilities for gpt-oss models by introducing a robust, schema-driven guided decoding mechanism for the chat format. The changes are well-structured, separating concerns effectively across different modules. The replacement of the mock-based ToolServer interaction with a more direct tool_names approach in the reasoning parser is a clean refactoring. The addition of comprehensive unit tests and, especially, the new end-to-end tests, provides strong confidence in the correctness and robustness of this new implementation. I have one high-severity comment regarding a mismatch between the production logic and unit tests for the python tool, which should be addressed to ensure full test coverage.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the tool calling capabilities for gpt-oss models by introducing a more robust structural tag schema for guided decoding. The refactoring to decouple the reasoning parser from the ToolServer and instead rely on a list of tool names is a solid architectural improvement, making the system more modular and easier to test. The new unit and end-to-end tests, especially those covering edge cases with tool names, are comprehensive and greatly increase confidence in the changes. I've identified one critical issue in how tool names are generated, which I've detailed in a specific comment.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the gpt-oss chat format generation for tool calling by introducing a more robust structural tags schema. The changes decouple the reasoning parser from the ToolServer and instead derive tool information directly from the request messages, which is a great design improvement. The addition of comprehensive unit and end-to-end tests, especially for edge cases, greatly increases confidence in this new implementation.

I have found one high-severity issue in the new get_tool_names_from_messages utility function. The logic for constructing tool names appears to be incorrect for tools that do not have a sub-name within their namespace (e.g., the built-in python tool), which could lead to incorrect guided decoding and tool call failures. A code suggestion has been provided to fix this.

Alec Solder added 2 commits November 5, 2025 10:23
Signed-off-by: Alec Solder <[email protected]>
Signed-off-by: Alec Solder <[email protected]>
@yeqcharlotte yeqcharlotte self-requested a review November 5, 2025 19:36
logger = init_logger(__name__)

no_func_reaonsing_tag = {
TRIGGERS = ["<|channel|>", "<|start|>assistant"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we can define this as some sort of yaml or json files but we enabled default values for these tags. this allows people to modify their template without changing vllm's binary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Maybe it is best that we have a default_template and load it in here?

Copy link
Contributor

@frank-wei frank-wei Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be the default template but only if it is at least neutral to general eval tests. Otherwise, people may question about it.
Also, double down on the suggestion from Charlotte if we have the flexibility of passing a json file.

@yeqcharlotte
Copy link
Collaborator

could you also summarize the main behavior change? we were missing following structural tag following in final messages?

@Hanchenli
Copy link
Contributor

As I understand now, this does not require input of a function tag from outside file and users only need to input whether they want to turn on this feature. If yes, the code looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

4 participants