-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Description
There are issues with the way we are formatting chat-style prompt templates that have been causing strange output from models that use them.
Most importantly, we are probably not preserving the EOS after the assistant's turn, as required by Llama-2 chat here (Mistral Instruct uses this format), and implemented in llama.cpp's server example here.
If we're not preserving the standard </s> EOS token, we are probably also failing to preserve the <|im_end|> token used by ChatML models such as Mistral OpenOrca, since that becomes the EOS token in those models (except for MPT Chat, which has an incorrect upstream config.json IMO).
The other potential issue is the way we are processing special tokens, especially </s>, in the prompt template. TinyLlama-Chat has been seen printing a literal </s>, which might be a sign that we are not tokenizing correctly (screenshots from @ThiloteE):

