-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Tidy Android Instructions README.md #7016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Remove CLBlast instructions(outdated), added OpenBlas.
Added apt install git, so that git clone works
|
Is OpenBLAS actually worth using in Android? For quantized models, it may be faster without it. Ultimately though, without the OpenCL instructions, this basically looks like "install termux and follow the normal build instructions for linux". So maybe it would be simpler that way. |
I like leaving the decision to the user if
Agreed. |
Linked to Linux build instructions
|
I build with OpenBLAS on Android, not that it matters. My chiming is, unfortunately, anecdotal. Is it really negligible? It's more difficult to tell on the phone if I'm being honest. |
|
The easiest way to tell if OpenBLAS helps would be to run |
|
CPU is definitely faster with quants on my device: CPU: |
|
I had to update, fix the convert script by adding the hash, and the upload the model I use, rebuild, and then download the quant. Plus, I have a bunch of other scripts running, so I'll post once it's all set. |
|
CPU is much faster! Why is that? |
I think |
Co-authored-by: slaren <[email protected]>
Co-authored-by: slaren <[email protected]>
Fdroid is not required Co-authored-by: slaren <[email protected]>
Thank you. I'll try various options and post results later. |
Co-authored-by: slaren <[email protected]>
|
Tested Here's some quick numbers, loading from shared: load from shared & load from load from Based on these figures, |
* Tidy Android Instructions README.md Remove CLBlast instructions(outdated), added OpenBlas. * don't assume git is installed Added apt install git, so that git clone works * removed OpenBlas Linked to Linux build instructions * fix typo Remove word "run" * correct style Co-authored-by: slaren <[email protected]> * correct grammar Co-authored-by: slaren <[email protected]> * delete reference to Android API * remove Fdroid reference, link directly to Termux Fdroid is not required Co-authored-by: slaren <[email protected]> * Update README.md Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
|
Tested with TinyLlama-1.1B-Chat-v1.0-Q8_0.gguf using Load from shared, Load from The results are near identical. Probably Tiny Llama (1.09 GiB) is too small to emphasize difference for this test, even mmap made no difference. I'll leave larger model benching for someone with a better device than mine. |
|
Hey everyone, As the original author of these README instructions, I have to admit that I now see how they might cause more confusion than clarity. Just to clarify for future users: I've personally found CLBlast to be quite effective when used with llama.cpp, especially for certain model families like StableLM and OpenLlama (provided you're not offloading layers). In my experience, it has boosted prompt processing speed by roughly 40%. However, it's important to note that while CLBlast does offer significant speed improvements, it's plagued by bugs. For many model families, or even within the aforementioned subsets when offloading layers, it tends to produce nonsensical output. This is disappointing, considering the untapped potential of the GPUs nestled within our smartphones. If there's any way I can assist, I'd like to offer a few insights based on my experimentation:
Here's hoping that Vulkan proves to be a more robust solution than OpenGL. |
Did your CLBLAST experience involve running corresponding tunners to achive speed for your device ? |
No, I have not tried the tunners yet. Good idea, it's a nice experiment to do. Thanks for the idea! |
Is this specific to Android builds or can be reproduced on PC too? |
As far as I know, it only happens during Android builds. All my tests were conducted with Adreno GPUs from Snapdragon. |
It's better to tidy readme regarding
CLBlastinstructions for Android.Removed CLBlast instructions(outdated). Simplified Android CPU Build instructions.