Skip to content

Fix multi-turn chat-style prompt formatting/tokenization #1961

@cebtenzzre

Description

@cebtenzzre

There are issues with the way we are formatting chat-style prompt templates that have been causing strange output from models that use them.

Most importantly, we are probably not preserving the EOS after the assistant's turn, as required by Llama-2 chat here (Mistral Instruct uses this format), and implemented in llama.cpp's server example here.

If we're not preserving the standard </s> EOS token, we are probably also failing to preserve the <|im_end|> token used by ChatML models such as Mistral OpenOrca, since that becomes the EOS token in those models (except for MPT Chat, which has an incorrect upstream config.json IMO).

The other potential issue is the way we are processing special tokens, especially </s>, in the prompt template. TinyLlama-Chat has been seen printing a literal </s>, which might be a sign that we are not tokenizing correctly (screenshots from @ThiloteE):

image

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions