Skip to content

Conversation

dpmm99
Copy link
Contributor

@dpmm99 dpmm99 commented May 2, 2025

Based on the April_2025 branch. Should I rebase it?

Example usage:

var modelParams = new ModelParams(path)
{
    TensorBufferOverrides = [new Abstractions.TensorBufferOverride(".*ffn.*", "CPU")],
    GpuLayerCount = -1
};

would make all tensors with 'ffn' in their names offload to CPU, with everything else on GPU.

@martindevans
Copy link
Member

martindevans commented May 2, 2025

Should I rebase it?

yes please, that'll remove a lot of noise from this PR and make it easier to review.

@dpmm99 dpmm99 force-pushed the feat/tensor-override branch from 3dd6110 to 8dc45de Compare May 2, 2025 20:19
@dpmm99
Copy link
Contributor Author

dpmm99 commented May 2, 2025

Should I rebase it?

yes please, that'll remove a lot of noise from this PR and make it easier to review.

Okay, done. But now it's going to show a conflict when you merge the Apr_2025 branch because they both "added" tensor_buft_overrides to LLamaModelParams. :P

@martindevans
Copy link
Member

CI is failing due to ot being detected as a spelling error, just add a suppression for that.

@martindevans
Copy link
Member

I've resolved the minor merge conflict. Just waiting on tests before I merge this.

/// </summary>
public static class IModelParamsExtensions
{
private static LLamaTensorBufferOverrideHelper bufferOverrideHelper = new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being static and shared between calls seems like a potential source of bugs. Can a new non-shared one be allocated when it's needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like it to, but I was thinking it requires some much messier changes to how the ModelParams is converted to LLamaModelParams. Or the memory would have to be allocated in the LLamaModelParams in the first place for proper disposal, but that's a native struct. So we'd have to do something like wrap LLamaModelParams in another class to keep the allocated null-terminated LLamaModelTensorBufferOverride array only until it can be properly disposed by the wrapper class. Again, I don't do much native stuff in C#, so maybe I'm overthinking it and you can just toss an array of structs into LLamaModelParams instead of an IntPtr...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment you're doing disposer.Add(bufferOverrideHelper);. That means that the bufferOverrideHelper object will be disposed when necessary (when LLamaSharp has finished using the LLamaModelParams struct).

So (unless I'm misunderstanding something here) you've currently got a system where the memory you're allocating is deallocated by that dispose call (which seems reasonable!).

But then you've got it as a static, which means the object managing the memory is shared.

/// <summary>
/// Helper for creating and managing tensor buffer overrides
/// </summary>
public class LLamaTensorBufferOverrideHelper : IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made private? It seems like it's only used internally and exposes quite a lot of new API surface if left public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly make it internal, but not private unless you want to move it into IModelParamsExtensions, which I guess is fine (especially with partial classes being a thing now). I think I probably just left it public because ToLlamaModelParams is public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp, I meant internal sorry.

@martindevans
Copy link
Member

That looks like a genuine test failure, due to the new properties in ModelsParams not serialising properly.

@dpmm99
Copy link
Contributor Author

dpmm99 commented May 11, 2025

That looks like a genuine test failure, due to the new properties in ModelsParams not serialising properly.

Looks like it's just failing because Assert.Equals is doing a by-reference comparison. I'll modify the test to treat the new property the same way as TensorSplits and MetadataOverrides.

dpmm99 added 3 commits May 11, 2025 14:51
Tested with this configuration in BatchedExecutorSimple:
parameters.GpuLayerCount = 99;
parameters.TensorBufferOverrides = new List<Abstractions.TensorBufferOverride> { new("blk\.(2[6-9]|[3-4][0-9]).*", "CPU") };
Because I used that to speed up Qwen-3-30B-A3B by a factor of 10 on my machine (though it would likely be less for batching since it's an MoE).
@martindevans martindevans merged commit daffe73 into SciSharp:master May 11, 2025
6 checks passed
@martindevans
Copy link
Member

Thanks for all the work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants