-
Notifications
You must be signed in to change notification settings - Fork 470
CPU Feature Detection #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU Feature Detection #65
Conversation
It's a great feature. Since many people don't have an efficient GPU, the performance of CPU is significant. BTW, would you like to be a committer of this project? I noticed that you're familiar with this area and has completed many good features. Recently I'm too busy with my work to maintain this project well. Even if I get away of my damm work after some time, I'd still be happy if you could develop together on this project. :) |
That would be great, I'll be happy to help if I can help take some of maintenance weight off you! Thankyou for asking :) |
I've added you to the group of write access. :) If there's any need for publishing a release, please contact me to push the packages. Thank you a lot for all your contributions and hope we could have a good time developing together! |
By the way, if you're adding new DLLs since you merged #64 you may want to put them into folders with appropriate names for this PR at the same time? That way you won't need to do another release of the runtime packages when this feature gets added. |
309cd3c
to
44fe261
Compare
Thank you for the reminder, but I didn't notice this comment before😶🌫️. I'll make a new release after this PR is merged :) v0.4.2 is only a pre-release. |
I've cleaned this up a bit:
So I think it's ready for review now. You'll need to rearrange the DLLs so that they're in separate folders (see the top comment). |
static NativeApi() | ||
{ | ||
// Try to load a preferred library, based on CPU feature detection | ||
TryLoadLibrary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value is ignored here, does the library loading still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be fine.
The DllImport
methods still work as normal. So if this fails to load anything they will just try to load the DLL as normal when called.
This is actually really important because on MacOS and Linux the TryLoadLibrary
method does nothing! It always "fails" and falls back to the normal behaviour.
765f5ab
to
756a1ad
Compare
I've rebased this onto master, so it's fully up to date with all the GGUF changes. I think the only thing left for this PR is to rearrange the native deps (which I'm not completely sure how to do). The required layout is:
MacOS?At the moment I'm not sure exactly how MacOS should be handled. The basic CPU |
Thank you for this work. Seems that it could integrate all the dlls into one backend package. I'll try to add the avx binaries and test it. |
I'm not sure how the CUDA stuff will work if it's all in one package. If it's all in one package I think the CUDA binaries would load (because they exist) but then fail at runtime if CUDA isn't supported. If that's true we'd either need to add a runtime check for CUDA compatibility, or keep them in a separate package. |
Hey Martin, I really appreciate for this good work but I'm afraid to delay this feature to the next version. I test it on my PC and found something strange. For example,
I think we should publish the new version quickly since the file format has been changed from |
That's absolutely fine! GGUF is really critical feature to get out ASAP, wheras this can wait. |
We'll need to test this extensively on different hardware before the next release! |
Motivation
During testing I noticed that the avx512 libllama.dll which I was using seemed to be much faster than the default CPU backend.
Proposed Change
This draft PR is a demonstration of a new way to load the native dependencies which we could use. This system performs feature detection on the CPU and then loads up the best DLL that the CPU can run. It is only written for Windows at the moment, but it could easily be extended to other platforms.
Why Is This A Draft?
Using this would require some modification to the backend package which I don't know how to do:
Modify the
LLamaSharp.Backend.Cpu
to contain three folders:win-avx512/libllama.dll
win-avx2/libllama.dll
win-avx/libllama.dll
libllama.dll
- whatever the most basic default should be (no e.g. AVX support at all?)Modify
LLamaSharp.Backend.Cuda11
to contain:win-cuda11/libllama.dll
Modify
LLamaSharp.Backend.Cuda12
to contain:win-cuda12/libllama.dll