Conversations leaking into each other during parallelization on RPC cluster #14893

becky-soda · 2025-07-26T22:09:53Z

becky-soda
Jul 26, 2025

I'm sorry to bother you all. My colleague and I been having trouble with the latest versions of llama.cpp, running over RPC. We do have a rather exotic set up, which I'll describe:

Primary Llama.cpp Machine: 1x Geforce 4090, 1x Geforce 4060 (CUDA)
Secondary Llama.cpp Machine: 1x Geforce 4060 (CUDA)
Tertiary Llama.cpp Machine: 1x RX 7900 XTX (Vulkan)

The primary machine starts with this command line:

set "LLAMA_SET_ROWS=1" && llama-server -m "X:\models\ML2-123B-Magnum-Diamond-IQ3_M-00001-of-00002.gguf" --host 192.168.1.69 --port 5002 -c 50000 --rpc 192.168.1.94:50052,192.168.1.70:50052 --n_gpu_layers 99 --no-warmup -fa -t 16 --threads-batch 16 -ctk f16 -ctv f16 -sm row --parallel 2

We're operating with the LLAMA_SET_ROWS=1 environment variable, and no CPUs are involved in inference.

We have parallelization running, and two of us access the RPC cluster simultaneously. Strangely, using llama.cpp compiled with various commits from the past week, we've been seeing conversations 'leak' or 'cross-talk' over into each other, for want of a better word, and sometimes even produce semi-lucid 'stream of thought' output based upon both conversations.

The last version we had working okay was from the 15th of July (commit cbc68be). There were some commits after that which outright broke parallelization for us, so we didn't use those (around and including commit 21c0217). When we switched to a newer version (commit 9008328), the parallelization began working again.

At this point, when we starting sending messages back and forth to the LLM, it became evident that something was wrong. I'd often get messages that seems to include some of his context, and he'd often get messages that seem to include some of my context. As an example, if I'm talking to the LLM about bash scripting, and he's talking to the LLM about writing professional letters, he would start getting elements of bash scripting seamlessly woven into his conversation, and I'd start getting information about formal writing in bash scripts. And sometimes it would just be messy stream-of-thought, with made-up-words, illegible grammar, and nonsensical punctuation.

I wanted to do a git bisect, but the trouble is that we invariably end up in a commit that has broken parallelization, so we can't test for the conversation leakage/cross-talk.

I'm at a loss as to how to handle this situation. Does anyone have any ideas, please?

ggerganov · 2025-07-27T03:52:34Z

ggerganov
Jul 27, 2025
Maintainer

Most likely fixed with #14853

0 replies

becky-soda · 2025-07-27T11:25:11Z

becky-soda
Jul 27, 2025
Author

I'm really sorry to say this, but we just tested it with a git pull from about an hour ago (commit 89d1029). It's still producing strange output after a few posts. We both tested with SillyTavern this time, so I could see the output in the consoles. My friend ended up getting my characters being mentioned in his dialogue, and I got characters from his.

Here's an example from my friend's conversation, while I (on a seperate PC, with the profile 'Becky') was conversing with a character named Gina: "Conversely, a stale odor or sudden violence, lips curling onto the balls, lips cursinglingerseverbalances convey meaning convey meaningfully work of overexplaining stating Gina, Always,** Creative Progressionarrative Progression:** Engaging Character Interactions: Respond thoughtfully to Becky's dialogue, actions, and environmental cues. Let Gina's reactions stem from subtle shifts: distant voices softening as they move deeper, the weight settling spine beneath their fingertips, or a gentle breeze carrying a hint of fallen rain."

I checked my friend's SillyTavern console, and there were no references in there to a character named Becky or Gina, meaning that llama.cpp is most likely getting those two characters from my parallel conversation.

Shortly after he got that comment, I got this one:

We both got partway through our conversations before this strangeness started occurring. I'm really sorry, but is there anything we can try at this point, please?

2 replies

becky-soda Jul 27, 2025
Author

A little update. We tested it again with flash attention off, and if anything the situation seems more prevalent. We're also able to reproduce the leaking and gibberish consistently if one of us tries running inference while the other is already running inference. In this example, I was generating a response, and as soon as my friend started generating a response on his PC, the stream on my PC got messed up.

He started running his inference at the time the words 'between herself and plugrelage' were appearing on my screen. At that point, it goes a bit wonky.

ggerganov Jul 27, 2025
Maintainer

See if any of the following changes fixes the issue and report the results:

Remove -sm row
Remove LLAMA_SET_ROWS=1
Keep LLAMA_SET_ROWS=1, add -kvu
Add GGML_CUDA_DISABLE_FUSION=1 GGML_VK_DISABLE_FUSION=1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Conversations leaking into each other during parallelization on RPC cluster #14893

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Conversations leaking into each other during parallelization on RPC cluster #14893

Uh oh!

becky-soda Jul 26, 2025

Replies: 2 comments · 2 replies

Uh oh!

ggerganov Jul 27, 2025 Maintainer

Uh oh!

becky-soda Jul 27, 2025 Author

Uh oh!

becky-soda Jul 27, 2025 Author

Uh oh!

ggerganov Jul 27, 2025 Maintainer

becky-soda
Jul 26, 2025

Replies: 2 comments 2 replies

ggerganov
Jul 27, 2025
Maintainer

becky-soda
Jul 27, 2025
Author

becky-soda Jul 27, 2025
Author

ggerganov Jul 27, 2025
Maintainer