Fix multi-turn chat-style prompt formatting/tokenization

There are issues with the way we are formatting chat-style prompt templates that have been causing strange output from models that use them.

Most importantly, we are probably not preserving the EOS after the assistant's turn, as required by Llama-2 chat [here](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/c1b0db933684edbfe29a06fa47eb19cc48025e93/tokenizer_config.json#L12) (Mistral Instruct uses this format), and implemented in llama.cpp's server example [here](https://github.com/ggerganov/llama.cpp/pull/5425).

If we're not preserving the standard `</s>` EOS token, we are probably also failing to preserve the `<|im_end|>` token used by ChatML models such as Mistral OpenOrca, since that becomes the EOS token in those models (except for MPT Chat, which has an incorrect upstream config.json IMO).

The other potential issue is the way we are processing special tokens, especially `</s>`, in the prompt template. TinyLlama-Chat has been seen printing a literal `</s>`, which might be a sign that we are not tokenizing correctly (screenshots from @ThiloteE):

![image](https://github.com/nomic-ai/gpt4all/assets/14168726/a11cfb9c-bec5-46a7-a10b-4e1f552b0778)

![image](https://github.com/nomic-ai/gpt4all/assets/14168726/4acab3a1-9907-4426-9c4b-ce899950d5de)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multi-turn chat-style prompt formatting/tokenization #1961

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix multi-turn chat-style prompt formatting/tokenization #1961

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions