Skip to content

Conversation

@levunet
Copy link
Contributor

@levunet levunet commented Sep 12, 2025

Purpose

Fixed a bug where the commentary value was missing in Invalid Channel due to the absence of with_custom_tools value when fetching the system message.

image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug that prevented tool usage with gpt-oss models. The issue was caused by the commentary channel being incorrectly removed from the system message, which is necessary for tool call functionality. The fix involves passing a with_custom_tools flag to the get_system_message function, determined by whether tools are present in the request. This change correctly preserves the commentary channel when tools are used. The fix is straightforward and effectively resolves the bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, it would be great to add a unit test for this, there are some examples here the test likely doesn't even need the model to run, just that the messages are being generated properly here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small nit would be that this will activate the commentary channel even if an empty list of tools is passed in, which is not uncommon. We could avoid that by using some like bool(request.tools) instead of request.tools is not None. But, I separately discovered this issue, Alec pointed me to this PR, and agree that this fix is needed to get accurate tool calling in Chat Completions with gpt-oss models.

Without this PR, I regularly see the models outputting non-built-in tool calls to the analysis channel, which isn't where they should go.

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 16, 2025
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@levunet levunet force-pushed the feat/fix_tool_check branch from 9f989c4 to 38382b4 Compare September 16, 2025 13:36
Fixed a bug where the commentary value was missing in Invalid Channel due to the absence of with_custom_tools value when fetching the system message.

Signed-off-by: kyt <[email protected]>
@bbrowning
Copy link
Contributor

This PR substantially improves Chat Completions streaming tool call handling for Harmony models, especially for gpt-oss-20b. Without this PR, the 20b model often (and 120b sometimes) outputs tool calls to the analysis channel, which is wrong. We could adjust our streaming parser code to handle that, which is a fairly trivial change. However, the more appropriate fix is this PR, which activates the commentary channel when the request contains tools.

It may not reduce 100% of cases where tool calls inappropriately go to the analysis channel, but it makes a massive improvement.

@levunet
Copy link
Contributor Author

levunet commented Sep 26, 2025

@bbrowning
There are additional PRs that have modified the content you mentioned. I believe the harmony library improvements have resolved the issue close to 100%. If you need it, I recommend using that harmony code.

#24954
openai/harmony#76

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this fixes. Thanks for this!

@bbrowning
Copy link
Contributor

Just a note that the CI failures look unrelated, as both are failing in a disk space check on the docker build jobs. This PR doesn't change anything that would impact free disk space on the builder nodes.

@lorenzocollodi
Copy link

Any updates on this? Or has it been fixed somewhere else?

@borishim
Copy link

I can confirm that by applying openai/harmony#76, #24768 and #24954 , codex works very well with gpt-oss-120b.

@PedroMiolaSilva
Copy link

Hey guys, any news here? Looking forward for this merge!

@DarkLight1337 DarkLight1337 merged commit 2ed3f20 into vllm-project:main Oct 3, 2025
43 checks passed
@DarkLight1337
Copy link
Member

Sorry for the delay, merging now

rahul-tuli pushed a commit to neuralmagic/vllm that referenced this pull request Oct 3, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
karan pushed a commit to karan/vllm that referenced this pull request Oct 6, 2025
southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
@bfroemel
Copy link

I can confirm that by applying openai/harmony#76, #24768 and #24954 , codex works very well with gpt-oss-120b.

@borishim could you share how you made that work besides applying the PRs? Is this with "--tool-call-parser openai --enable-auto-tool-choice" and responses API? Many thanks!

@borishim
Copy link

borishim commented Oct 10, 2025

I can confirm that by applying openai/harmony#76, #24768 and #24954 , codex works very well with gpt-oss-120b.

@borishim could you share how you made that work besides applying the PRs? Is this with "--tool-call-parser openai --enable-auto-tool-choice" and responses API? Many thanks!

Here are the switches I used:
--tool-call-parser openai --reasoning-parser openai-gptoss --enable-auto-tool-choice --max-model-len 131072

Note that I applied the patches against 0.10.2 docker images. Also, current codex cli expects chat completion API for gpt-oss. So I think that you should use chat completion API.

@levunet
Copy link
Contributor Author

levunet commented Oct 10, 2025

@bfroemel @borishim

I found a bug in vllm 0.11.0 where, after installing FlashInfer and using it by default for sampling, specific sentences would repeat infinitely or fail to generate responses during answer generation.

If you need it, you can use the following method and it will work normally:

VLLM_USE_FLASHINFER_SAMPLER=0

Also, I think this might be a minor mistake, but you should probably use 'openai_gptoss' for the reasoning-parser.

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-cpu that referenced this pull request Oct 27, 2025
Sync to upstream's
[v0.11.0](https://github.com/vllm-project/vllm/releases/tag/v0.11.0)
release + a cherry pick of
vllm-project/vllm#24768

This PR targets CUDA but may also be sufficient for ROCM.

Dockerfile updates:
- general updates to match upstream's Dockerfile
- nvcc, nvrtc and cuobjdump were addded for deepgemm JIT requirementes:
neuralmagic/nm-vllm-ent@2a545c8
- missing paths were added for triton JIT:
neuralmagic/nm-vllm-ent@b3027fc

Tests:
Branch in nm-cicd:
https://github.com/neuralmagic/nm-cicd/tree/sync-v0.11-cuda
accept-sync:
https://github.com/neuralmagic/nm-cicd/actions/runs/18270550524 --
please ignore unit tests, they need to be updated to v1.
Image tested: quay.io/vllm/automation-vllm:cuda-18270550524
Image validation:
https://github.com/neuralmagic/nm-cicd/actions/runs/18271507914
Whisper runs:
https://github.com/neuralmagic/nm-cicd/actions/runs/18281815955/job/52046560584
https://github.com/neuralmagic/nm-cicd/actions/runs/18281511979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants