Skip to content

Conversation

ggerganov
Copy link
Member

The n_ctx name is causing some confusion since it's actual meaning is the size of the KV cache, while n_ctx_train is the training context of the model

This change fixes that, but since it is a big one and touches a lot of stuff, I'm not sure if it worth merging. Maybe sometime in the future, when the time is right

Original PR: #5546

@ggerganov ggerganov added breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. refactoring Refactoring labels Feb 18, 2024
@bobqianic
Copy link
Contributor

Does n_ctx in whisper.cpp also refer to the size of the KV cache?

@ggerganov
Copy link
Member Author

In the decoder - yes


// llm_build_context
static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= n_ctx
static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= kv_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this commend was missing a closing )

@compilade
Copy link
Collaborator

compilade commented Feb 21, 2024

I do not agree with this change (but I like the underlying intention of making llama.cpp less confusing).

As I'm working on supporting Mamba in llama.cpp (see #5328), I'd like to warn that renaming n_ctx to kv_size would make it harder to support non-Transformer architectures in a straightforward way.

With Mamba, the KV cache size is tied to the maximum number of distinct sequences processed at the same time. Not the "context size". n_ctx is still used to limit the maximum number of processed tokens, which is fine, because some examples need a fixed size for the buffer of input tokens (e.g. in server, lookahead, lookup, parallel, and perplexity when calling llama_batch_init).

What I propose instead (and this is what I've started doing in #5328) is to keep n_ctx, but use kv_self.size instead of n_ctx in the places where it's really the KV cache size that is meant, because Mamba breaks the equivalence of n_ctx with kv_self.size.

TL;DR: renaming n_ctx to kv_size makes it harder to decouple the context size from the KV cache size.

@ggerganov ggerganov closed this Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. refactoring Refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants