llama_decode lock #595

martindevans · 2024-03-12T21:02:48Z

Added a lock object into SafeLlamaModelHandle which all calls to llama_decode (in the SafeLLamaContextHandle) lock first.

This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp. We may need an even wider lock (preventing inference on any two models simultaneously). Testing required.

…lama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp.

…e necessary (at least with the CUDA backend).

martindevans added 2 commits March 12, 2024 21:01

Added a lock object into SafeLlamaModelHandle which all calls to `l…

011d019

…lama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp.

Modified the lock to be global over _all_ inferences. This seems to b…

7e1f472

…e necessary (at least with the CUDA backend).

martindevans merged commit ce4de7d into SciSharp:master Mar 13, 2024

martindevans deleted the llama_decode_lock branch March 13, 2024 00:33

SanftMonster mentioned this pull request Mar 28, 2024

Cannot add a user message after another user message (Parameter message #585

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama_decode lock #595

llama_decode lock #595

Uh oh!

martindevans commented Mar 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

llama_decode lock #595

llama_decode lock #595

Uh oh!

Conversation

martindevans commented Mar 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant