-
Couldn't load subscription status.
- Fork 13.5k
kv-cache : rework kv_cell #13706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv-cache : rework kv_cell #13706
Conversation
|
In the next PR I will try to rework these 3 methods with something like llama.cpp/src/llama-kv-cache.h Lines 45 to 56 in 9023ae3
The main goal is to be able to run SWA caches with just When this rework is ready, I will use the new llama.cpp/src/llama-kv-cache.h Lines 37 to 41 in 9023ae3
Simulating a full cache will be now achieved by initializing the appropriate batches and just not processing them. Any suggestions about the plan are welcome. |
0a8cdc3 to
eda2e13
Compare
ggml-ci
1ec785c to
0dc4804
Compare
0dc4804 to
dd394a6
Compare
|
While this change does not have a measurable impact on the performance under normal conditions, when building in ./scripts/compare-commits.sh master gg/kv-cache-simplify-part2 -m ./models/llama-3.2-1b-instruct/ggml-model-q8_0.gguf -fa 1 -d 8192 -n 128 -p 0,1024 -r 5
diff --git a/scripts/compare-commits.sh b/scripts/compare-commits.sh
index e40d1cc6d..7d9ca79cf 100755
--- a/scripts/compare-commits.sh
+++ b/scripts/compare-commits.sh
@@ -24,7 +24,7 @@ dir="build-bench"
function run {
rm -fr ${dir} > /dev/null
- cmake -B ${dir} -S . $cmake_opts > /dev/null
+ cmake -DCMAKE_BUILD_TYPE=Debug -B ${dir} -S . $cmake_opts > /dev/null
cmake --build ${dir} -t llama-bench > /dev/null
${dir}/bin/llama-bench -o sql -oe md $bench_args | sqlite3 llama-bench.sqlite
} |
cont #13194
The KV cells editing logic is now implemented via the new
struct llama_kv_cells_unifiedin the newsrc/llama-kv-cells.hsource. The goal is to simplify the implementation inllama-kv-cache.cppand make it easier to understand and update in the future.One of the primary simplifications is that
llama_kv_cache_unifiedno longer tracks the number ofusedcells manually. This is now automatically tracked by thellama_kv_cells_unifiedbased on the edits that we apply, such as adding and removing sequences from the cells. Same for thehas_shiftflag.pos,delta,seq) is now a structure of arrays for better cache localitystd::bitsetinstead ofstd::setHere is an example of the position shift logic before and after the change:
Next
n = cell_max()) instead of searching for it on every batch