Feat/tensor override #1180

dpmm99 · 2025-05-02T03:15:27Z

Based on the April_2025 branch. Should I rebase it?

Example usage:

var modelParams = new ModelParams(path)
{
    TensorBufferOverrides = [new Abstractions.TensorBufferOverride(".*ffn.*", "CPU")],
    GpuLayerCount = -1
};

would make all tensors with 'ffn' in their names offload to CPU, with everything else on GPU.

martindevans · 2025-05-02T13:17:54Z

Should I rebase it?

yes please, that'll remove a lot of noise from this PR and make it easier to review.

ggml-org/llama.cpp#11397

dpmm99 · 2025-05-02T20:20:42Z

Should I rebase it?

yes please, that'll remove a lot of noise from this PR and make it easier to review.

Okay, done. But now it's going to show a conflict when you merge the Apr_2025 branch because they both "added" tensor_buft_overrides to LLamaModelParams. :P

martindevans · 2025-05-02T22:00:27Z

CI is failing due to ot being detected as a spelling error, just add a suppression for that.

martindevans · 2025-05-11T19:00:49Z

I've resolved the minor merge conflict. Just waiting on tests before I merge this.

martindevans · 2025-05-11T19:04:43Z

LLama/Extensions/IModelParamsExtensions.cs

 /// </summary>
 public static class IModelParamsExtensions
 {
+    private static LLamaTensorBufferOverrideHelper bufferOverrideHelper = new();


This being static and shared between calls seems like a potential source of bugs. Can a new non-shared one be allocated when it's needed?

I'd like it to, but I was thinking it requires some much messier changes to how the ModelParams is converted to LLamaModelParams. Or the memory would have to be allocated in the LLamaModelParams in the first place for proper disposal, but that's a native struct. So we'd have to do something like wrap LLamaModelParams in another class to keep the allocated null-terminated LLamaModelTensorBufferOverride array only until it can be properly disposed by the wrapper class. Again, I don't do much native stuff in C#, so maybe I'm overthinking it and you can just toss an array of structs into LLamaModelParams instead of an IntPtr...

At the moment you're doing disposer.Add(bufferOverrideHelper);. That means that the bufferOverrideHelper object will be disposed when necessary (when LLamaSharp has finished using the LLamaModelParams struct).

So (unless I'm misunderstanding something here) you've currently got a system where the memory you're allocating is deallocated by that dispose call (which seems reasonable!).

But then you've got it as a static, which means the object managing the memory is shared.

martindevans · 2025-05-11T19:05:51Z

LLama/Native/LLamaTensorBufferOverrideHelper.cs

+    /// <summary>
+    /// Helper for creating and managing tensor buffer overrides
+    /// </summary>
+    public class LLamaTensorBufferOverrideHelper : IDisposable


Can this be made private? It seems like it's only used internally and exposes quite a lot of new API surface if left public.

We can certainly make it internal, but not private unless you want to move it into IModelParamsExtensions, which I guess is fine (especially with partial classes being a thing now). I think I probably just left it public because ToLlamaModelParams is public.

Derp, I meant internal sorry.

martindevans · 2025-05-11T19:09:17Z

That looks like a genuine test failure, due to the new properties in ModelsParams not serialising properly.

dpmm99 · 2025-05-11T19:33:20Z

That looks like a genuine test failure, due to the new properties in ModelsParams not serialising properly.

Looks like it's just failing because Assert.Equals is doing a by-reference comparison. I'll modify the test to treat the new property the same way as TensorSplits and MetadataOverrides.

Tested with this configuration in BatchedExecutorSimple: parameters.GpuLayerCount = 99; parameters.TensorBufferOverrides = new List<Abstractions.TensorBufferOverride> { new("blk\.(2[6-9]|[3-4][0-9]).*", "CPU") }; Because I used that to speed up Qwen-3-30B-A3B by a factor of 10 on my machine (though it would likely be less for batching since it's an MoE).

LLama/Native/LLamaModelParams.cs

Co-authored-by: Martin Evans <[email protected]>

martindevans · 2025-05-11T23:56:31Z

Thanks for all the work on this!

Support override-tensors parameter

8dc45de

ggml-org/llama.cpp#11397

dpmm99 force-pushed the feat/tensor-override branch from 3dd6110 to 8dc45de Compare May 2, 2025 20:19

dpmm99 and others added 2 commits May 2, 2025 17:59

Suppress 'ot' in _typos.toml

1ac27e2

Merge branch 'master' into feat/tensor-override

d905f9d

martindevans reviewed May 11, 2025

View reviewed changes

dpmm99 added 3 commits May 11, 2025 14:51

Make helper for -ot parameters internal

2defa8b

Fix ModelParams serialization test

3c8d239

martindevans reviewed May 11, 2025

View reviewed changes

LLama/Native/LLamaModelParams.cs Outdated Show resolved Hide resolved

dpmm99 and others added 2 commits May 11, 2025 17:47

Use unsafe type directly for -ot field in LLamaModelParams

d2607ac

Co-authored-by: Martin Evans <[email protected]>

Use unsafe type directly for -ot field in LLamaModelParams, part 2

eeb8c8f

martindevans merged commit daffe73 into SciSharp:master May 11, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/tensor override #1180

Feat/tensor override #1180

Uh oh!

dpmm99 commented May 2, 2025 •

edited

Loading

Uh oh!

martindevans commented May 2, 2025 •

edited

Loading

Uh oh!

dpmm99 commented May 2, 2025

Uh oh!

martindevans commented May 2, 2025

Uh oh!

martindevans commented May 11, 2025

Uh oh!

martindevans May 11, 2025

Uh oh!

dpmm99 May 11, 2025

Uh oh!

martindevans May 11, 2025

Uh oh!

martindevans May 11, 2025

Uh oh!

dpmm99 May 11, 2025

Uh oh!

martindevans May 11, 2025

Uh oh!

martindevans commented May 11, 2025

Uh oh!

dpmm99 commented May 11, 2025

Uh oh!

Uh oh!

Uh oh!

martindevans commented May 11, 2025

Uh oh!

Uh oh!

Feat/tensor override #1180

Feat/tensor override #1180

Uh oh!

Conversation

dpmm99 commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindevans commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpmm99 commented May 2, 2025

Uh oh!

martindevans commented May 2, 2025

Uh oh!

martindevans commented May 11, 2025

Uh oh!

martindevans May 11, 2025

Choose a reason for hiding this comment

Uh oh!

dpmm99 May 11, 2025

Choose a reason for hiding this comment

Uh oh!

martindevans May 11, 2025

Choose a reason for hiding this comment

Uh oh!

martindevans May 11, 2025

Choose a reason for hiding this comment

Uh oh!

dpmm99 May 11, 2025

Choose a reason for hiding this comment

Uh oh!

martindevans May 11, 2025

Choose a reason for hiding this comment

Uh oh!

martindevans commented May 11, 2025

Uh oh!

dpmm99 commented May 11, 2025

Uh oh!

Uh oh!

Uh oh!

martindevans commented May 11, 2025

Uh oh!

Uh oh!

dpmm99 commented May 2, 2025 •

edited

Loading

martindevans commented May 2, 2025 •

edited

Loading