-
Notifications
You must be signed in to change notification settings - Fork 224
fix: handle malformed messages missing <|message|> token #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
StreamableParser now gracefully handles LLM output where stop tokens appear before the expected <|message|> token. The parser extracts header metadata (channel, recipient, content_type) from accumulated tokens and treats the remainder as message content. Refactors header parsing into a shared helper to eliminate duplication. Signed-off-by: Ben Browning <[email protected]>
|
This resolves the issue I reported as #80 and should resolve many or most of the real-world scenarios that can lead to #38 as well. Sometimes, for whatever reason, the model is not outputting In essence, this makes the |
|
Hi! I've found the same problem with GPT oss 120B. What should I do to solve it? |
|
@gonmarfer If your problem is identical to the one I solve here, then building your own openai-harmony library that includes this change would be reasonable. I'm submitting the PR for this here in the hopes to get it merged and into the next release of this library so that vLLM can then pull in a newer version to fix some bugs with gpt-oss models surfacing there because of this issue. If you use another inference server, they'd potentially have to make changes. |
|
Thanks! Do you know if there's any fixed solution for this other one?
|
|
@gonmarfer That is one of the error messages this PR fixes. I can't say for sure if it would fix your specific case without seeing the exact model output that triggered the error in your case, but that's generally a class of errors this PR resolves. |
|
Thanks! So basically, is this the only patch I should do? Modifying the harmony_utils.py script? |
|
So far I've corrected that file, but I've also received an error message as follows: File "/home/gfernan2/llama4-env/lib/python3.10/site-packages/openai_harmony/__init__.py", line 525, in parse_messages_from_completion_tokens
raw_json: str = self._inner.parse_messages_from_completion_tokens(
openai_harmony.HarmonyError: unexpected tokens remaining in message header:The complete message is something as: File "/home/gfernan2/llama4-env/lib/python3.10/site-packages/openai_harmony/__init__.py", line 525, in parse_messages_from_completion_tokens
raw_json: str = self._inner.parse_messages_from_completion_tokens(
openai_harmony.HarmonyError: unexpected tokens remaining in message header: ["……………….........………...……………....……………………...……......………………...…………………………...………...………......………—………………………??……………………………...……...……….…………………………..…………………???………………………...…………………...…...…………….……...………………………...…………………………………………………………………………………………………………………………...………………………………………...……......………………………………………………………...…………………………………………………………………………...………………...……...…………………………………………………………...…………...……………...……...……………………………………………………………………………………..…………….………………………………………………………………………………………………………………………………………………………………………………………......……………………………………………………………………………………", "…...………...……………………………………………………………………………………………",
"……………...……………………………………………………………………………………………………………………………………………………………………………………………………………...…………………………………", "………...……………………….…………………………………", "…", "………………………………………………………………………………………………………………………………", "……………………………………………………………………………………………………………...……………………………………………………………………………………………………………………………….………………………………………...………………………....…………………………………………………………………………………………………………….…………………………………………………………..……………………………………………………………………………………………………………………….……………………………………………………...……………………………………………………………………………….……………………………………………………………………………………...……………………………………………………………….…...…………………………………………………………………………………………...……………………………………………………………………………………………………………………………………...………………………………………………………………….……………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………", "The", "…….…………...……………………………………………...……………………………………………………………………………………………………………………………...……………………………………………………………………………………………………………………......……...………………………………………………………...………………………………...…………………………...…………………………………………………...\"",
"We", "need", "to", "respond", "to", "user.", "They", "are", "experiencing", "sadness", "and", "crying", "after", "baby,", "uncertain", "about", "postpartum", "depression", "vs", "baby", "blues.", "We", "should", "provide", "supportive", "information,", "encourage", "seeking", "professional", "help,", "explain", "differences", "between", "baby", "blues", "and", "postpartum", "depression,", "typical", "duration,", "symptoms,", "risk", "factors,", "and", "steps", "to", "take.", "Also", "mention", "postpartum", "anxiety,", "postpartum", "OCD,", "postpartum", "PTSD.", "Provide", "resources.", "Encourage", "self-care,", "support", "network,", "postpartum", "check-ups,", "postpartum", "mental", "health", "screening.", "Provide", "coping", "strategies.", "Provide", "resources", "for", "postpartum", "depression", "screening", "tools", "like", "EPDS.", "Provide", "encouragement.", "Mention", "that", "postpartum", "depression", "can", "be", "treated", "successfully.", "Provide", "signs", "to", "watch", "for.", "Encourage", "not", "to", "self-diagnose.",
"Encourage", "seeking", "help", "from", "OB-GYN,", "midwife,", "mental", "health", "professional.", "Also", "mention", "telehealth", "options.", "Provide", "phone", "numbers", "for", "crisis", "lines.", "Provide", "self-care", "suggestions:", "rest,", "nutrition,", "support,", "therapy,", "medication,", "exercise,", "etc.", "Provide", "mention", "of", "postpartum", "support", "groups.", "We", "need", "to", "be", "careful", "not", "to", "give", "medical", "advice.", "Provide", "general", "info.", "Also", "we", "need", "to", "comply", "with", "policy:", "It's", "a", "medical", "context.", "We", "can", "give", "general", "information,", "but", "not", "specific", "diagnosis.", "We", "can", "recommend", "professional", "help.", "The", "user", "is", "asking", "for", "help.", "We", "should", "respond", "in", "a", "supportive", "tone.", "We", "should", "also", "note", "that", "the", "user", "used", "a", "garbled", "text.", "We", "ignore", "that.", "Provide", "a", "clear", "answer.", "We", "should", "mention", "the", "difference:", "baby", "blues", "typically", "starts", "within", "2-3", "days", "postpartum,", "peaks", "at", "4-7", "days,", "resolves", "by", "2", "weeks.", "Postpartum", "depression", "can", "start", "anytime", "within", "first", "4", "weeks", "postpartum,", "lasts", "longer,", "more", "severe.", "Symptoms:",
"persistent", "sadness,", "loss", "of", "interest,", "guilt,", "anxiety,", "appetite", "changes,", "sleep", "disturbances,", "thoughts", "of", "harming", "self", "or", "baby.", "Provide", "screening", "EPDS.", "Provide", "resources:", "Postpartum", "Support", "International", "(PSI),", "Postpartum", "Depression", "Helpline.", "Provide", "hotlines:", "988.", "Provide", "steps:", "talk", "to", "provider,", "consider", "therapy,", "medication,", "support", "groups.", "Also", "mention", "postpartum", "anxiety.", "Provide", "coping:", "rest,", "support,", "self-care,", "healthy", "diet,", "avoid", "alcohol,", "talk", "to", "partner.", "Also", "mention", "postpartum", "OCD:", "intrusive", "thoughts", "about", "baby", "safety,", "repeated", "cleaning.", "Also", "mention", "postpartum", "PTSD:", "flashbacks,", "nightmares,", "hypervigilance.", "Encourage", "not", "to", "isolate,", "to", "reach", "out", "to", "friends,", "family,", "partner.", "Also", "mention", "postpartum", "depression", "can", "be", "treated", "with", "SSRIs,", "therapy,", "CBT,", "EMDR.", "Provide", "mention", "of", "postpartum", "depression", "screening.", "Encourage", "to", "schedule", "postpartum", "visit.", "Also", "mention", "postpartum", "depression", "is", "common:", "about", "10-15%.", "Provide", "reassurance", "that", "it's", "treatable.", "Also", "mention", "postpartum", "blues", "is", "normal,", "but", "postpartum", "depression", "is", "not.", "Also", "mention", "postpartum", "depression", "can", "lead", "to", "negative", "outcomes", "if", "untreated.", "Also", "mention", "postpartum", "depression", "can", "cause", "physical", "symptoms:", "fatigue,", "low",
"energy.", "Also", "mention", "postpartum", "depression", "may", "lead", "to", "impaired", "bonding.", "We", "need", "to", "comply", "with", "policy:", "no", "diagnosing,", "no", "prescribing.", "Provide", "general", "info.", "Encourage", "professional", "help.", "Also", "mention", "that", "postpartum", "depression", "can", "be", "triggered", "by", "hormonal", "changes,", "sleep", "deprivation,", "stress,", "lack", "of", "support.", "Also", "mention", "postpartum", "depression", "can", "be", "associated", "with", "postpartum", "anxiety.", "Also", "mention", "postpartum", "depression", "risk", "factors:", "previous", "depression,", "family", "history,", "lack", "of", "support,", "traumatic", "birth.", "Also", "mention", "postpartum", "depression", "can", "be", "postpartum", "psychosis", "(rare,", "severe).", "Provide", "signs:", "hallucinations,", "delusions,", "mania.", "Encourage", "immediate", "medical", "attention.", "Also", "mention", "postpartum", "depression", "can", "be", "moderate", "or", "severe.", "Provide", "guidelines.", "Encourage", "using", "EPDS.", "Also", "mention", "postpartum", "depression", "can", "be", "treated", "with", "therapy,", "medication,", "support", "groups.", "Also", "mention",
"postpartum", "depression", "can", "be", "self-managed", "with", "exercise,", "sleep", "hygiene,", "healthy", "diet,", "social", "support.", "Also", "mention", "postpartum", "depression", "can", "be", "integrated", "with", "postpartum", "care.", "Also", "mention", "postpartum", "depression", "can", "be", "assessed", "by", "OB-GYN,", "midwife,", "mental", "health", "professional.",
"Also", "mention", "postpartum", "depression", "can", "be", "addressed", "via", "postpartum", "check-ups.", "Also", "mention", "postpartum", "depression", "can", "be", "integrated", "with", "postpartum", "support.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "depression", "screening.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "depression", "helplines.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by",
"postpartum", "support", "groups.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "mental", "health", "professionals.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "mental", "health", "apps.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "mental", "health", "resources.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "mental", "health", "support.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "mental", "health", "support.", "Also", "mention", "postpartum", "depression", "can", "be", "addressed", "by", "postpartum", "mental", "health", "support.", "We", "need", "to", "respond", "clearly.", "We", "should", "avoid", "repeating", "too", "much,", "but", "mention", "key", "differences.", "We", "should",
"ask", "if", "they", "have", "been", "experiencing", "other", "symptoms:", "appetite", "changes,", "sleep", "changes,", "guilt,", "anxiety,", "intrusive", "thoughts,", "or", "thoughts", "of", "harming", "baby.", "We", "should", "ask", "if", "they", "have", "had", "any", "thoughts", "of", "harming", "themselves", "or", "baby.", "If", "yes,", "urgent", "help.", "We", "should", "mention", "postpartum", "depression", "is", "treatable,", "encourage", "professional", "help.", "We", "should", "mention", "postpartum", "depression", "is", "not", "a", "sign", "of", "weakness.", "We", "should", "mention", "postpartum", "depression", "can", "affect", "mother,", "baby,", "partner.", "We", "should", "mention", "postpartum", "depression", "can", "lead",
"to", "postpartum", "anxiety.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "OCD.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "PTSD.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "psychosis.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "mania.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "mania.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "mania.", "We", "should", "mention", "postpartum", "depression", "can",
"lead", "to", "postpartum", "mania.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "mania.", "We", "should", "mention", "postpartum", "depression", "can", "lead", "to", "postpartum", "mania.", "Ok.", "We", "need", "to", "structure", "the", "answer", "with", "headings:", "Understanding", "Baby", "Blues", "vs", "Postpartum", "Depression,", "Key", "Symptoms,", "When", "to", "Seek", "Help,", "How", "to", "Get", "Help,", "Self-Care", "and", "Support,", "Resources.", "We", "can", "also", "provide", "a", "short", "screening", "question:", "\"Have", "you",
"felt", "sad,", "hopeless,", "or", "worthless", "for", "more", "than", "a", "few", "days?", "Have", "you", "lost", "interest?", "Do", "you", "feel", "guilty?", "Are", "you", "sleeping", "poorly?", "Are", "you", "having", "thoughts", "of", "harming", "yourself", "or", "baby?\"", "If", "yes,", "talk", "to", "provider.", "We", "need", "to", "keep", "it", "supportive", "and", "non-judgmental.", "We", "need", "to", "mention", "mental", "health", "professional.", "Ok.", "Also", "mention", "postpartum", "support", "groups.", "Ok,", "let's", "craft", "the", "answer.", "We", "should", "avoid", "overloading", "with", "too", "many", "details.", "Provide", "concise", "but", "thorough.", "Use", "bullet", "points.", "Ok.", "Now"] |
|
@gonmarfer No, it's the changes in this PR - the one you're commenting on - that fix the particular error you mentioned. You'll need to apply the changes to the Rust code in this repository, which means recompiling the Rust library locally and then updating the openai-harmony library in your installation to use this new version you built yourself. |
|
Tysm! I've done what you mentioned before, but now I face this error... File "my_env/lib/python3.10/site-packages/openai_harmony/__init__.py", line 525, in parse_messages_from_completion_tokens
raw_json: str = self._inner.parse_messages_from_completion_tokens(
openai_harmony.HarmonyError: Unknown role: analysis |
|
What model output caused this error? If I can reproduce this locally I can track things down better. On the surface, it looks like we're getting a channel name into the role field accidentally, either due to poor model output or something else. I believe to hit this would require a model output that's missing the I haven't seen that specific type of failure myself, but it is another type of possible model failure that we could perhaps recover from. |
|
@codex review |
|
@dkundel-openai Can you take a look at this PR |
Agreed, please. I'm waiting for these fixes so they can be integrated with vLLM. |
|
I also encountered the following form of error on certain (long multiturn) conversations: Or just completely forget to use analysis channel It happens on openai/gpt-oss-120b , unsloth/gpt-oss-120b-BF16 and finetuned unsloth bf16 models. It breaks vllm inference (response forming) during harmony format parsing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should introduce these change under a non-strict mode flag. we should add a field under we should add a new RenderConversationConfig called strict: bool and only allow this behavior if strict is false. strict would default to true.parse_messages_from_completion_tokens_with_options<I> and a struct ParseOptions { strict: bool } that implements Default with strict = true and have parse_messages_from_completion_tokens call this with the default value. this would ensure that we are not unintentionally breaking anyone who is relying on this as a validating parser. if you are willing to make the addition we can merge this otherwise im happy to make the update this weekend and we can have it merged by next week.
|
Thanks for the feedback, and making this optional behavior to preserve the previous exact semantics is a reasonable suggestion. Inside vLLM (where I ultimately need to plumb this through), we directly use Rust is quite new to me, but I think I follow what was suggested about implementing |
This moves the more less strict parsing behavior behind a new ParseOptions.strict value. It keeps all Rust signatures the same, adding new `_with_options` variants where we need to pass in options. For Python and wasm bindings, it adds the `strict` field as an optional kwarg. Signed-off-by: Ben Browning <[email protected]>
b3b7ffe to
3dfde17
Compare
|
@scott-oai I believe I addressed your concern around maintaining the previous behavior and making the new behavior opt-in. I had to make a judgement call on whether I wanted to surface and pass in a I believe this new change keeps all existing behavior while allowing clients to opt-in to the less strict parsing. I'm not entirely sure how to test the wasm side of things, but the additional Rust and Python tests all pass with this. |
|
Thanks! |
StreamableParser now gracefully handles LLM output where stop tokens appear before the expected <|message|> token. The parser extracts header metadata (channel, recipient, content_type) from accumulated tokens and treats the remainder as message content.
Refactors header parsing into a shared helper to eliminate duplication.