Add support for Mistral #9

pathorn · 2023-12-07T22:51:39Z

Tested on mistralai/Mistral-7B-Instruct-v0.1

Also tested with llama and it still seems to work.

NikolaBorisov · 2023-12-07T23:24:23Z

server/text_generation_server/models/flash_causal_lm.py

        batch.prefill_next_token_indices = None
        batch.max_seqlen = batch.max_seqlen + 1

+        batch.kv_cache_usage = CACHE_MANAGER.usage()


I'm a bit worried that this call might cause perf drop. Can you try with and without it.

pathorn added 2 commits December 1, 2023 15:17

Update flash attention to v2.3.6

2407296

Add support for mistral models

19cd26d

NikolaBorisov approved these changes Dec 7, 2023

View reviewed changes

pathorn changed the base branch from main to kv-cache December 8, 2023 01:47

Allow pinning versions of upstream docker images

e2b16aa

NikolaBorisov changed the title ~~Add support for MIstral~~ Add support for Mistral Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Mistral #9

Add support for Mistral #9

Uh oh!

pathorn commented Dec 7, 2023

Uh oh!

NikolaBorisov Dec 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for Mistral #9

Are you sure you want to change the base?

Add support for Mistral #9

Uh oh!

Conversation

pathorn commented Dec 7, 2023

Uh oh!

NikolaBorisov Dec 7, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants