Skip to content

Conversation

martindevans
Copy link
Member

@martindevans martindevans commented Jul 25, 2023

Motivation

During testing I noticed that the avx512 libllama.dll which I was using seemed to be much faster than the default CPU backend.

Proposed Change

This draft PR is a demonstration of a new way to load the native dependencies which we could use. This system performs feature detection on the CPU and then loads up the best DLL that the CPU can run. It is only written for Windows at the moment, but it could easily be extended to other platforms.

Why Is This A Draft?

Using this would require some modification to the backend package which I don't know how to do:

Modify the LLamaSharp.Backend.Cpu to contain three folders:

  • win-avx512/libllama.dll
  • win-avx2/libllama.dll
  • win-avx/libllama.dll
  • libllama.dll - whatever the most basic default should be (no e.g. AVX support at all?)

Modify LLamaSharp.Backend.Cuda11 to contain:

  • win-cuda11/libllama.dll

Modify LLamaSharp.Backend.Cuda12 to contain:

  • win-cuda12/libllama.dll

@SanftMonster
Copy link
Collaborator

It's a great feature. Since many people don't have an efficient GPU, the performance of CPU is significant. BTW, would you like to be a committer of this project? I noticed that you're familiar with this area and has completed many good features. Recently I'm too busy with my work to maintain this project well. Even if I get away of my damm work after some time, I'd still be happy if you could develop together on this project. :)

@martindevans
Copy link
Member Author

BTW, would you like to be a committer of this project?

That would be great, I'll be happy to help if I can help take some of maintenance weight off you! Thankyou for asking :)

@SanftMonster
Copy link
Collaborator

That would be great, I'll be happy to help if I can help take some of maintenance weight off you! Thankyou for asking :)

I've added you to the group of write access. :) If there's any need for publishing a release, please contact me to push the packages. Thank you a lot for all your contributions and hope we could have a good time developing together!

@martindevans
Copy link
Member Author

martindevans commented Aug 5, 2023

By the way, if you're adding new DLLs since you merged #64 you may want to put them into folders with appropriate names for this PR at the same time?

That way you won't need to do another release of the runtime packages when this feature gets added.

@martindevans martindevans force-pushed the alternative_dependency_loading branch from 309cd3c to 44fe261 Compare August 9, 2023 14:47
@SanftMonster
Copy link
Collaborator

SanftMonster commented Aug 9, 2023

By the way, if you're adding new DLLs since you merged #64 you may want to put them into folders with appropriate names for this PR at the same time?

That way you won't need to do another release of the runtime packages when this feature gets added.

Thank you for the reminder, but I didn't notice this comment before😶‍🌫️. I'll make a new release after this PR is merged :) v0.4.2 is only a pre-release.

@martindevans
Copy link
Member Author

I've cleaned this up a bit:

  • It's now usable on any platform (not just Windows)
    • It doesn't do anything on the other platforms yet, but there are obvious blocks waiting to be filled in a future PR.
  • Rearranged the code a bit so the list of preferred dependencies is much more readable in the code

So I think it's ready for review now.

You'll need to rearrange the DLLs so that they're in separate folders (see the top comment).

@martindevans martindevans marked this pull request as ready for review August 9, 2023 15:25
@martindevans martindevans changed the title Proposal: CPU Feature Detection CPU Feature Detection Aug 14, 2023
This was referenced Aug 20, 2023
static NativeApi()
{
// Try to load a preferred library, based on CPU feature detection
TryLoadLibrary();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value is ignored here, does the library loading still work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine.

The DllImport methods still work as normal. So if this fails to load anything they will just try to load the DLL as normal when called.

This is actually really important because on MacOS and Linux the TryLoadLibrary method does nothing! It always "fails" and falls back to the normal behaviour.

@martindevans martindevans force-pushed the alternative_dependency_loading branch from 765f5ab to 756a1ad Compare September 2, 2023 13:03
@martindevans
Copy link
Member Author

martindevans commented Sep 2, 2023

I've rebased this onto master, so it's fully up to date with all the GGUF changes. I think the only thing left for this PR is to rearrange the native deps (which I'm not completely sure how to do).

The required layout is:

-- All this stuff is distributed in the "CPU runtimes" nuget package
    /libllama.so (this is the noavx version)
    /libllama.dll (this is the noavx version)
    /libllama.dylib (??? see below)
    /avx
        /libllama.so
        /libllama.dll
    /avx2
        /libllama.so
        /libllama.dll
    /avx512
        /libllama.so
        /libllama.dll

-- This is distributed in the cu12 runtime package
    /cu12.1.0
        /libllama.so
        /libllama.dll
        
-- This is distributed in the cu11 runtime package
    /cu11.7.1
        /libllama.so
        /libllama.dll

MacOS?

At the moment I'm not sure exactly how MacOS should be handled. The basic CPU libllama.dylib should probably be in the folder whre I've shown it, but I don't know how metal works.

@SanftMonster
Copy link
Collaborator

Thank you for this work. Seems that it could integrate all the dlls into one backend package. I'll try to add the avx binaries and test it.

@martindevans
Copy link
Member Author

I'm not sure how the CUDA stuff will work if it's all in one package. If it's all in one package I think the CUDA binaries would load (because they exist) but then fail at runtime if CUDA isn't supported. If that's true we'd either need to add a runtime check for CUDA compatibility, or keep them in a separate package.

@SanftMonster
Copy link
Collaborator

Hey Martin, I really appreciate for this good work but I'm afraid to delay this feature to the next version. I test it on my PC and found something strange. For example,

  1. TryLoadLibrary cannot load any of it even though I thought I placed the files in correct structure.
  2. Though my computer supports avx2, it loaded the dll in avx folder instead. (Maybe it just search the file under the directory by name order and avx is prior to avx2)

I think we should publish the new version quickly since the file format has been changed from ggml to gguf. I believe we'll resolve the problems above before the next release. 😊

@martindevans
Copy link
Member Author

That's absolutely fine! GGUF is really critical feature to get out ASAP, wheras this can wait.

@Oceania2018 Oceania2018 merged commit 10678a8 into SciSharp:master Sep 17, 2023
@martindevans martindevans deleted the alternative_dependency_loading branch September 17, 2023 20:47
@martindevans
Copy link
Member Author

We'll need to test this extensively on different hardware before the next release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants