cuda : fix defrag with quantized KV #9319

slaren · 2024-09-05T01:48:40Z

There were several issues with KV defragmentation with quantized KV:

Requires ggml_cpy from quant to quant, which was not supported in the CUDA backend
ggml_backend_sched cannot fallback to the CPU backend either when the destination is pre-allocated, which was not correctly detected
Trying to do so would result in a buffer overflow in the graph leafs, which results in a crash

This fixes the issues in ggml_backend_sched and adds support to the CUDA backend for ggml_cpy when the types are the same and the tensors are contiguous (using cudaMemcpyAsync).

Other backends may also be affected.

Fixes #9314

cuda : fix defrag with quantized KV (ggml-org#9319)

* Important: this guards assert in ggml-backend.c introduced in ggml-org/llama.cpp#9319 , be aware * Merged recent Seed commit * Added a small .txt guide on code that needs to be added to make clblast work on current llama.cpp * minor display styling

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 5, 2024

cuda : fix defrag with quantized KV

e462919

slaren force-pushed the sl/fix-cuda-defrag branch from 290a6e5 to e462919 Compare September 5, 2024 01:50

ggerganov approved these changes Sep 5, 2024

View reviewed changes

slaren merged commit 4db0478 into master Sep 5, 2024

slaren deleted the sl/fix-cuda-defrag branch September 5, 2024 09:13

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Sep 5, 2024

Merge b3669

07c85ef

cuda : fix defrag with quantized KV (ggml-org#9319)

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

cuda : fix defrag with quantized KV (ggml-org#9319)

f038065

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

cuda : fix defrag with quantized KV (ggml-org#9319)

1b494f8

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

cuda : fix defrag with quantized KV (ggml-org#9319)

7e4db18

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Feb 25, 2025

cuda : fix defrag with quantized KV (ggml-org#9319)

89b1806

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda : fix defrag with quantized KV #9319

cuda : fix defrag with quantized KV #9319

Uh oh!

slaren commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuda : fix defrag with quantized KV #9319

cuda : fix defrag with quantized KV #9319

Uh oh!

Conversation

slaren commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants