fix: Resolve response format corruption due to incorrect encoding #76

levunet · 2025-09-16T08:38:20Z

I found the main encoding issues and the fixes are as follows:

'to' and ' to' encoding issue
There was an issue with incorrectly encoding the 'to' token (id 935) and ' to' token (id 316). During model training, it appears the model was configured to use id 316 when using tools, but when encoding produces 935 instead, there's a high probability the model will generate abnormal tokens. To resolve this, I modified it to consistently encode ' to'.
'<|constrain|>' encoding issue
It appears that during model training, this token was not used, and instead tool execution requests were trained using the ' json' token with a space included, similar to the ' to' token. Therefore, I removed '<|constrain|>' and modified it to ensure spaces are mandatory. This solves the issue where including '<|constrain|>' makes the model's function output extremely unstable and generates abnormal tokens as tool executions are repeated multiple times.

Main token investigation details (token id):
220 = ' '
12606 = 'comment'
815 = 'ary'
316 = ' to' (presumed to be a token trained specifically for tool requests)
935 = 'to'
28 = '='
6961 = '??'
4108 = 'json'
5701 = ' json' (presumed to be a token trained specifically for tool requests)

openai/gpt-oss-20b

For those using vllm, by applying the two additional PRs and using the model configuration values from the test code, you can use the tool execution functionality with almost 100% reliability without issues.

vllm-project/vllm#24954
vllm-project/vllm#24768

andresC98 · 2025-09-16T12:51:55Z

I noticed this issue as well, in vLLM I get this error which I believe is related:

openai.InternalServerError: Error code: 500 - {'error': {'message': 'Unexpected token 12606 while expecting start token 200006', 'type': 'Internal Server Error', 'param': None, 'code': 500}}

(related gh issue/comment: vllm-project/vllm#22515 (comment))

UPDATE: I was able to fix this by fetching the latest model files from HF since this commit fixed the generatiom_config.json file. My saved files were from before that commit.

levunet · 2025-09-17T11:10:21Z

@dkundel-openai
Hello! I've submitted a PR addressing some critical encoding issues in the harmony library that were causing incorrect token generation. If you have time, I would greatly appreciate your review and feedback. Thank you!

levunet · 2025-09-23T05:22:16Z

@andresC98
I have just checked the UPDATED content. I've also double-checked, but that commit did not resolve the issue for me. The problem I'm experiencing occurs when there is heavy tool usage in the gpt-oss model, where encoding errors accumulate and cause token mistakes after to=functions.name. This PR that fixes this issue is really necessary for me...

levunet · 2025-09-23T06:00:07Z

gpt-oss_test.py
messages.txt

When the changes in the current PR are applied, there are no cases where tool requests are missing even after multiple tests. However, when not applied, there is an issue where responses come without tool requests when executed multiple times. (Photo attached)

(This test code was temporarily written to reproduce the issue I encountered, so there may be some odd parts.)

sa411022 · 2025-09-23T07:27:13Z

I encountered the issue that tool call parameters sometimes were incorrectly generated inside "reasoning_text" instead of "function_call".
After applying this PR, the issue seems to be solved.
@levunet Thanks for your good work.

I can also confirm that updating generation_config.json did not help with this issue.

levunet · 2025-09-30T15:39:48Z

Hi team! @dkundel-openai @scott-oai

Just a gentle ping on this PR. I understand you might be busy, but I'd appreciate any feedback when you get a chance.

This fix addresses some encoding issues that could affect tool functionality stability. Happy to make any adjustments if needed!

amal5haji · 2025-10-07T11:40:52Z

Merge this pls

borishim · 2025-10-10T16:47:20Z

@levunet Although you experimentally found that removing <|constrain|> tends to stabilize tool calling, your assumption that 'the <|constrain|> token is not appeared to be used during the training of the model' sounds too strong to believe for me. Can you elaborate more on this? Do you have any other observations to support your argument?

borishim · 2025-10-10T16:52:15Z

@levunet Perhaps a smaller diff like this may work, what do you think? Let me test this as well...

index 6a9305b..d04aad7 100644
--- a/src/encoding.rs
+++ b/src/encoding.rs
@@ -823,7 +823,8 @@ impl Render<Message> for HarmonyEncoding {
         // next render the header recipient, if there is one
         if let Some(recipient) = &message.recipient {
             if recipient != "all" {
-                self.render_text_into(format!(" to={recipient}"), into)?;
+                self.render_text_into(" to", into)?;
+                self.render_text_into(format!("={recipient}"), into)?;
             }
         }
 
@@ -844,7 +845,7 @@ impl Render<Message> for HarmonyEncoding {
                     self.render_text_into(" ", into)?;
                     self.render_formatting_token_into(FormattingToken::ConstrainedFormat, into)?;
                     if !rest.is_empty() {
-                        self.render_text_into(rest, into)?;
+                        self.render_text_into(format!(" {rest}"), into)?;
                     }
                 } else {
                     self.render_text_into(format!(" {content_type}"), into)?;

levunet · 2025-10-10T17:04:39Z

Since I organized this based on my experimental results, I think my opinion may have come across as unintentionally too strong. To explain my earlier argument in more detail, I conducted hundreds of tests using multiple tool callings with dozens of tools, and during that process, I experimented with the <|constrain|> token while conducting various token tests for stabilization.

As a key peculiarity, when using this token, there was a very high probability of outputting ' json' which is not used otherwise, but in the opposite case - when not using the <|constrain|> token - the probability of outputting '<|constrain|>' was very low. In my experimental results, it was never outputted, though I assumed the probability was just very low.

Additionally, normal responses were generated only when using the ' to' and ' json' tokens with spaces included, and when 'to' and 'json' tokens without spaces were used due to some mistakes, I confirmed that model errors accumulated and the response structure broke down. Based on these results, I thought the model tends to use the learned data structure as-is, which led me to think that the <|constrain|> token was likely not used in training.

borishim · 2025-10-11T00:29:07Z

@levunet Perhaps a smaller diff like this may work, what do you think? Let me test this as well...

index 6a9305b..d04aad7 100644
--- a/src/encoding.rs
+++ b/src/encoding.rs
@@ -823,7 +823,8 @@ impl Render<Message> for HarmonyEncoding {
         // next render the header recipient, if there is one
         if let Some(recipient) = &message.recipient {
             if recipient != "all" {
-                self.render_text_into(format!(" to={recipient}"), into)?;
+                self.render_text_into(" to", into)?;
+                self.render_text_into(format!("={recipient}"), into)?;
             }
         }
 
@@ -844,7 +845,7 @@ impl Render<Message> for HarmonyEncoding {
                     self.render_text_into(" ", into)?;
                     self.render_formatting_token_into(FormattingToken::ConstrainedFormat, into)?;
                     if !rest.is_empty() {
-                        self.render_text_into(rest, into)?;
+                        self.render_text_into(format!(" {rest}"), into)?;
                     }
                 } else {
                     self.render_text_into(format!(" {content_type}"), into)?;

I'm reporting that this smaller patch was NOT enough to make vllm-hosted gpt-oss-120b work with codex. So it look like that your original patch is necessary!

levunet · 2025-11-05T08:34:15Z

@scott-oai
I would be very grateful if you could also take a look at this PR.

scott-oai · 2025-11-05T16:54:53Z

taking a look today, thanks for the patience

Copilot

Pull Request Overview

This PR addresses response format corruption caused by incorrect token encoding during tool execution in the Harmony model. The fix ensures the model uses the correct tokens that were specifically trained for tool requests, preventing abnormal token generation.

Key changes:

Modified recipient encoding to use ' to' token (id 316) instead of 'to' token (id 935) by splitting the rendering into separate parts
Removed '<|constrain|>' token usage and ensured mandatory spaces in content type rendering to align with training data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/encoding.rs

scott-oai

there is a lot to unpack here, i think it would be helpful to take this step by step.

`to` vs `to`

There was an issue with incorrectly encoding the 'to' token (id 935) and ' to' token (id 316). During model training, it appears the model was configured to use id 316 when using tools, but when encoding produces 935 instead, there's a high probability the model will generate abnormal tokens. To resolve this, I modified it to consistently encode ' to'.

The initial encoding of format!(" to={recipient}") vs to followed by format!("={recipient}") i believe should not make a difference... the bpe should always end up encoding to either way -- the only way this may not happen would be if there was a super long merge, but in that case you shouldn't see 935 still, you would see one super long token (which i doubt exists). so either way this should always produce 316. you can test it naively like:

#[test]
fn test_to_encodings() {
    let encoding = load_harmony_encoding(HarmonyEncodingName::HarmonyGptOss).unwrap();
    let mut tokens1 = encoding.tokenizer.encode_ordinary(" to");
    tokens1.extend(encoding.tokenizer.encode_ordinary("=functions.get_weather"));

    let tokens2 = encoding
        .tokenizer
        .encode_with_special_tokens(" to=functions.get_weather");
    println!("tokens1: {:?}", tokens1);
    println!("tokens2: {:?}", tokens2);
    assert_eq!(tokens1, tokens2);
}

produces:

running 1 test
tokens1: [316, 28, 44580, 775, 170154]
tokens2: [316, 28, 44580, 775, 170154]
test tests::test_to_encodings ... ok

@levunet do you have any examples of situations that produce 935?

ordering of recipient/channel

is there a reason we changed ordering here? i dont see a justification for this anywhere and it does break some of the regression tests

'<|constrain|>' encoding issue

i will have to get back to you on this and will look into it today

scott-oai · 2025-11-05T18:09:40Z

src/encoding.rs

            self.render_text_into(channel, into)?;
        }

+        // next render the header recipient, if there is one


have you intentionally reversed the order of recipient/channel?

levunet · 2025-11-06T03:27:09Z

@scott-oai

First, regarding the 'to' vs ' to' issue, as you mentioned, there is no problem that would generate token 935. I think this was my mistake. This was an issue I checked a long time ago, and I tried to find the related content again but couldn't confirm it. I will modify it to use the code from before the changes, and if I encounter the same issue again in the future, I will share it with you.

For the recipient/channel ordering, the problem occurred with consecutive tool calls. When making tool execution requests in the format '<|channel|>commentary to=functions.read_file' after the channel, the model responds normally, but when using the format 'assistant to=functions.read_file <|channel|>', the gpt-oss model has an issue where it makes tool calls in the analysis channel.

Problematic data: ['<|channel|>', 'analysis', '<|message|>', 'We', ' need', ' to', ' produce', ' README', ' by', ' checking', ' basic', ' project', ' info', ' and', ' each', ' service', ' operation', '.\n\n', "Let's", ' inspect', ' other', ' controllers', ':', ' Service', 'Controller', ',', ' Client', 'Controller', '.', ' Also', ' maybe', ' repository', ' or', ' DTO', 's', '.', ' But', ' likely', ' just', ' summar', 'izing', ' endpoints', '.\n\n', 'Also', ' check', ' pom', '.xml', ' for', ' dependencies', ' like', ' spring', '-', 'boot', '-st', 'arter', '-web', ',', ' j', 'pa', ' etc', '.', " Let's", ' open', ' pom', '.xml', '.', '<|end|>', '<|start|>', 'assistant', '<|channel|>', 'analysis', ' to', '=', 'functions', '.file', '_read', ' code', '<|message|>', '{"', 'file', '_path', '":', '"/', 'home', '/test', 'user', '/Documents', '/test', '-demo', '/p', 'om', '.xml', '"}', '<|call|>']

Regarding <|constrain|>, I had speculated that it wasn't used in model training, but this seems to be partially incorrect as well. As I added more tools, there were cases where the gpt-oss model would output <|constrain|> when certain tools were added. Since the current harmony changes remove the <|constrain|> token from being used, it's not a problem even if it's included in the output data, but when it is included, the model's output becomes extremely unstable, so I think it would be good if you could look into this issue in detail!

This was referenced Sep 16, 2025

[Bug] Fix gpt-oss missing tool content vllm-project/vllm#24954

Open

openai_harmony.HarmonyError: unexpected tokens remaining in message header #38

Open

levunet mentioned this pull request Sep 26, 2025

[openai] Fix missing tool usage check (system message) vllm-project/vllm#24768

Merged

levunet mentioned this pull request Oct 10, 2025

[Bug][v0.11.0]: gpt-oss-120b generates with no output vllm-project/vllm#26480

Open

1 task

levunet mentioned this pull request Oct 11, 2025

The text I generated using this repository is inconsistent with the official output. #81

Open

levunet changed the title ~~[Bug] fix render - tool formatting~~ fix: Resolve response format corruption due to incorrect encoding Nov 2, 2025

scott-oai requested a review from Copilot November 5, 2025 17:35

Copilot AI reviewed Nov 5, 2025

View reviewed changes

src/encoding.rs Outdated Show resolved Hide resolved

src/encoding.rs Outdated Show resolved Hide resolved

scott-oai requested changes Nov 5, 2025

View reviewed changes

fix: tool formatting

fff423a

levunet force-pushed the feat/fix-tool-formatting branch from 9fb9c0b to fff423a Compare November 6, 2025 03:38

dr75 mentioned this pull request Nov 6, 2025

[Bug]: handle tool calls in analysis channel vllm-project/vllm#28139

Open

fix: Resolve response format corruption due to incorrect encoding #76

Are you sure you want to change the base?

fix: Resolve response format corruption due to incorrect encoding #76

Uh oh!

Conversation

levunet commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andresC98 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levunet commented Sep 17, 2025

Uh oh!

levunet commented Sep 23, 2025

Uh oh!

levunet commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sa411022 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levunet commented Sep 30, 2025

Uh oh!

amal5haji commented Oct 7, 2025

Uh oh!

borishim commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

borishim commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levunet commented Oct 10, 2025

Uh oh!

borishim commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levunet commented Nov 5, 2025

Uh oh!

scott-oai commented Nov 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

scott-oai left a comment

Choose a reason for hiding this comment

to vs to

ordering of recipient/channel

'<|constrain|>' encoding issue

Uh oh!

scott-oai Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

levunet commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

levunet commented Sep 16, 2025 •

edited

Loading

andresC98 commented Sep 16, 2025 •

edited

Loading

levunet commented Sep 23, 2025 •

edited

Loading

sa411022 commented Sep 23, 2025 •

edited

Loading

borishim commented Oct 10, 2025 •

edited

Loading

borishim commented Oct 10, 2025 •

edited

Loading

borishim commented Oct 11, 2025 •

edited

Loading

`to` vs `to`

levunet commented Nov 6, 2025 •

edited

Loading