Conversations leaking into each other during parallelization on RPC cluster #14893
Replies: 2 comments 2 replies
-
Most likely fixed with #14853 |
Beta Was this translation helpful? Give feedback.
-
I'm really sorry to say this, but we just tested it with a git pull from about an hour ago (commit 89d1029). It's still producing strange output after a few posts. We both tested with SillyTavern this time, so I could see the output in the consoles. My friend ended up getting my characters being mentioned in his dialogue, and I got characters from his. Here's an example from my friend's conversation, while I (on a seperate PC, with the profile 'Becky') was conversing with a character named Gina: "Conversely, a stale odor or sudden violence, lips curling onto the balls, lips cursinglingerseverbalances convey meaning convey meaningfully work of overexplaining stating Gina, Always,** Creative Progressionarrative Progression:** Engaging Character Interactions: Respond thoughtfully to Becky's dialogue, actions, and environmental cues. Let Gina's reactions stem from subtle shifts: distant voices softening as they move deeper, the weight settling spine beneath their fingertips, or a gentle breeze carrying a hint of fallen rain." I checked my friend's SillyTavern console, and there were no references in there to a character named Becky or Gina, meaning that llama.cpp is most likely getting those two characters from my parallel conversation. Shortly after he got that comment, I got this one: We both got partway through our conversations before this strangeness started occurring. I'm really sorry, but is there anything we can try at this point, please? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm sorry to bother you all. My colleague and I been having trouble with the latest versions of llama.cpp, running over RPC. We do have a rather exotic set up, which I'll describe:
Primary Llama.cpp Machine: 1x Geforce 4090, 1x Geforce 4060 (CUDA)
Secondary Llama.cpp Machine: 1x Geforce 4060 (CUDA)
Tertiary Llama.cpp Machine: 1x RX 7900 XTX (Vulkan)
The primary machine starts with this command line:
set "LLAMA_SET_ROWS=1" && llama-server -m "X:\models\ML2-123B-Magnum-Diamond-IQ3_M-00001-of-00002.gguf" --host 192.168.1.69 --port 5002 -c 50000 --rpc 192.168.1.94:50052,192.168.1.70:50052 --n_gpu_layers 99 --no-warmup -fa -t 16 --threads-batch 16 -ctk f16 -ctv f16 -sm row --parallel 2
We're operating with the LLAMA_SET_ROWS=1 environment variable, and no CPUs are involved in inference.
We have parallelization running, and two of us access the RPC cluster simultaneously. Strangely, using llama.cpp compiled with various commits from the past week, we've been seeing conversations 'leak' or 'cross-talk' over into each other, for want of a better word, and sometimes even produce semi-lucid 'stream of thought' output based upon both conversations.
The last version we had working okay was from the 15th of July (commit cbc68be). There were some commits after that which outright broke parallelization for us, so we didn't use those (around and including commit 21c0217). When we switched to a newer version (commit 9008328), the parallelization began working again.
At this point, when we starting sending messages back and forth to the LLM, it became evident that something was wrong. I'd often get messages that seems to include some of his context, and he'd often get messages that seem to include some of my context. As an example, if I'm talking to the LLM about bash scripting, and he's talking to the LLM about writing professional letters, he would start getting elements of bash scripting seamlessly woven into his conversation, and I'd start getting information about formal writing in bash scripts. And sometimes it would just be messy stream-of-thought, with made-up-words, illegible grammar, and nonsensical punctuation.
I wanted to do a git bisect, but the trouble is that we invariably end up in a commit that has broken parallelization, so we can't test for the conversation leakage/cross-talk.
I'm at a loss as to how to handle this situation. Does anyone have any ideas, please?
Beta Was this translation helpful? Give feedback.
All reactions