-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
InvestigatingKV-Cache Managementkv-cache management for efficient LLM inferencekv-cache management for efficient LLM inferencetriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
It would be great to support this new model! https://cohere.com/blog/command-a
They use a fairly unique architecture, where some layers use sliding window attention while others use global attention with no position embeddings, so even though I read through the documentation on how to add a model I'm a little lost on how to do this myself.
Metadata
Metadata
Assignees
Labels
InvestigatingKV-Cache Managementkv-cache management for efficient LLM inferencekv-cache management for efficient LLM inferencetriagedIssue has been triaged by maintainersIssue has been triaged by maintainers