-
Notifications
You must be signed in to change notification settings - Fork 223
fix: Resolve response format corruption due to incorrect encoding #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I noticed this issue as well, in vLLM I get this error which I believe is related:
(related gh issue/comment: vllm-project/vllm#22515 (comment)) UPDATE: I was able to fix this by fetching the latest model files from HF since this commit fixed the generatiom_config.json file. My saved files were from before that commit. |
|
@dkundel-openai |
|
@andresC98 |
|
I encountered the issue that tool call parameters sometimes were incorrectly generated inside "reasoning_text" instead of "function_call". I can also confirm that updating generation_config.json did not help with this issue. |
|
Hi team! @dkundel-openai @scott-oai Just a gentle ping on this PR. I understand you might be busy, but I'd appreciate any feedback when you get a chance. This fix addresses some encoding issues that could affect tool functionality stability. Happy to make any adjustments if needed! |
|
Merge this pls |
|
@levunet Although you experimentally found that removing <|constrain|> tends to stabilize tool calling, your assumption that 'the <|constrain|> token is not appeared to be used during the training of the model' sounds too strong to believe for me. Can you elaborate more on this? Do you have any other observations to support your argument? |
|
@levunet Perhaps a smaller diff like this may work, what do you think? Let me test this as well... index 6a9305b..d04aad7 100644
--- a/src/encoding.rs
+++ b/src/encoding.rs
@@ -823,7 +823,8 @@ impl Render<Message> for HarmonyEncoding {
// next render the header recipient, if there is one
if let Some(recipient) = &message.recipient {
if recipient != "all" {
- self.render_text_into(format!(" to={recipient}"), into)?;
+ self.render_text_into(" to", into)?;
+ self.render_text_into(format!("={recipient}"), into)?;
}
}
@@ -844,7 +845,7 @@ impl Render<Message> for HarmonyEncoding {
self.render_text_into(" ", into)?;
self.render_formatting_token_into(FormattingToken::ConstrainedFormat, into)?;
if !rest.is_empty() {
- self.render_text_into(rest, into)?;
+ self.render_text_into(format!(" {rest}"), into)?;
}
} else {
self.render_text_into(format!(" {content_type}"), into)?; |
|
Since I organized this based on my experimental results, I think my opinion may have come across as unintentionally too strong. To explain my earlier argument in more detail, I conducted hundreds of tests using multiple tool callings with dozens of tools, and during that process, I experimented with the <|constrain|> token while conducting various token tests for stabilization. As a key peculiarity, when using this token, there was a very high probability of outputting ' json' which is not used otherwise, but in the opposite case - when not using the <|constrain|> token - the probability of outputting '<|constrain|>' was very low. In my experimental results, it was never outputted, though I assumed the probability was just very low. Additionally, normal responses were generated only when using the ' to' and ' json' tokens with spaces included, and when 'to' and 'json' tokens without spaces were used due to some mistakes, I confirmed that model errors accumulated and the response structure broke down. Based on these results, I thought the model tends to use the learned data structure as-is, which led me to think that the <|constrain|> token was likely not used in training. |
I'm reporting that this smaller patch was NOT enough to make vllm-hosted gpt-oss-120b work with codex. So it look like that your original patch is necessary! |
|
@scott-oai |
|
taking a look today, thanks for the patience |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses response format corruption caused by incorrect token encoding during tool execution in the Harmony model. The fix ensures the model uses the correct tokens that were specifically trained for tool requests, preventing abnormal token generation.
Key changes:
- Modified recipient encoding to use ' to' token (id 316) instead of 'to' token (id 935) by splitting the rendering into separate parts
- Removed '<|constrain|>' token usage and ensured mandatory spaces in content type rendering to align with training data
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a lot to unpack here, i think it would be helpful to take this step by step.
to vs to
There was an issue with incorrectly encoding the 'to' token (id 935) and ' to' token (id 316). During model training, it appears the model was configured to use id 316 when using tools, but when encoding produces 935 instead, there's a high probability the model will generate abnormal tokens. To resolve this, I modified it to consistently encode ' to'.
The initial encoding of format!(" to={recipient}") vs to followed by format!("={recipient}") i believe should not make a difference... the bpe should always end up encoding to either way -- the only way this may not happen would be if there was a super long merge, but in that case you shouldn't see 935 still, you would see one super long token (which i doubt exists). so either way this should always produce 316. you can test it naively like:
#[test]
fn test_to_encodings() {
let encoding = load_harmony_encoding(HarmonyEncodingName::HarmonyGptOss).unwrap();
let mut tokens1 = encoding.tokenizer.encode_ordinary(" to");
tokens1.extend(encoding.tokenizer.encode_ordinary("=functions.get_weather"));
let tokens2 = encoding
.tokenizer
.encode_with_special_tokens(" to=functions.get_weather");
println!("tokens1: {:?}", tokens1);
println!("tokens2: {:?}", tokens2);
assert_eq!(tokens1, tokens2);
}
produces:
running 1 test
tokens1: [316, 28, 44580, 775, 170154]
tokens2: [316, 28, 44580, 775, 170154]
test tests::test_to_encodings ... ok
@levunet do you have any examples of situations that produce 935?
ordering of recipient/channel
is there a reason we changed ordering here? i dont see a justification for this anywhere and it does break some of the regression tests
'<|constrain|>' encoding issue
i will have to get back to you on this and will look into it today
| self.render_text_into(channel, into)?; | ||
| } | ||
|
|
||
| // next render the header recipient, if there is one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you intentionally reversed the order of recipient/channel?
|
First, regarding the 'to' vs ' to' issue, as you mentioned, there is no problem that would generate token 935. I think this was my mistake. This was an issue I checked a long time ago, and I tried to find the related content again but couldn't confirm it. I will modify it to use the code from before the changes, and if I encounter the same issue again in the future, I will share it with you. For the recipient/channel ordering, the problem occurred with consecutive tool calls. When making tool execution requests in the format '<|channel|>commentary to=functions.read_file' after the channel, the model responds normally, but when using the format 'assistant to=functions.read_file <|channel|>', the gpt-oss model has an issue where it makes tool calls in the analysis channel. Problematic data: ['<|channel|>', 'analysis', '<|message|>', 'We', ' need', ' to', ' produce', ' README', ' by', ' checking', ' basic', ' project', ' info', ' and', ' each', ' service', ' operation', '.\n\n', "Let's", ' inspect', ' other', ' controllers', ':', ' Service', 'Controller', ',', ' Client', 'Controller', '.', ' Also', ' maybe', ' repository', ' or', ' DTO', 's', '.', ' But', ' likely', ' just', ' summar', 'izing', ' endpoints', '.\n\n', 'Also', ' check', ' pom', '.xml', ' for', ' dependencies', ' like', ' spring', '-', 'boot', '-st', 'arter', '-web', ',', ' j', 'pa', ' etc', '.', " Let's", ' open', ' pom', '.xml', '.', '<|end|>', '<|start|>', 'assistant', '<|channel|>', 'analysis', ' to', '=', 'functions', '.file', '_read', ' code', '<|message|>', '{"', 'file', '_path', '":', '"/', 'home', '/test', 'user', '/Documents', '/test', '-demo', '/p', 'om', '.xml', '"}', '<|call|>'] Regarding <|constrain|>, I had speculated that it wasn't used in model training, but this seems to be partially incorrect as well. As I added more tools, there were cases where the gpt-oss model would output <|constrain|> when certain tools were added. Since the current harmony changes remove the <|constrain|> token from being used, it's not a problem even if it's included in the output data, but when it is included, the model's output becomes extremely unstable, so I think it would be good if you could look into this issue in detail! |
9fb9c0b to
fff423a
Compare

I found the main encoding issues and the fixes are as follows:
'to' and ' to' encoding issue
There was an issue with incorrectly encoding the 'to' token (id 935) and ' to' token (id 316). During model training, it appears the model was configured to use id 316 when using tools, but when encoding produces 935 instead, there's a high probability the model will generate abnormal tokens. To resolve this, I modified it to consistently encode ' to'.
'<|constrain|>' encoding issue
It appears that during model training, this token was not used, and instead tool execution requests were trained using the ' json' token with a space included, similar to the ' to' token. Therefore, I removed '<|constrain|>' and modified it to ensure spaces are mandatory. This solves the issue where including '<|constrain|>' makes the model's function output extremely unstable and generates abnormal tokens as tool executions are repeated multiple times.
Main token investigation details (token id):
220 = ' '
12606 = 'comment'
815 = 'ary'
316 = ' to' (presumed to be a token trained specifically for tool requests)
935 = 'to'
28 = '='
6961 = '??'
4108 = 'json'
5701 = ' json' (presumed to be a token trained specifically for tool requests)
openai/gpt-oss-20b
For those using vllm, by applying the two additional PRs and using the model configuration values from the test code, you can use the tool execution functionality with almost 100% reliability without issues.
vllm-project/vllm#24954
vllm-project/vllm#24768