-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
TL;DR: I propose to default -m to models/ + filename from -mu (or -hff) if it's set
It's easy to misuse these flags, for instance:
./main -mu https://huggingface.co/NousResearch/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct-Q5_K_M.gguf -p "Test"
# Wait patiently for 50GB to download
# ...
# Wanna test something else?
./main -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q2_K.gguf -p "Test"
# Oh well, your 50GB model is gone forever nowIn a nutshell:
-
The workaround (always specify
-mu&-mtogether) is cumbersome./main -mu https://huggingface.co/NousResearch/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct-Q5_K_M.gguf \ -m models/Meta-Llama-3-70B-Instruct-Q5_K_M.gguf \ -p "Test" -
it feels weird / wrong that w/o an explicit
-m, these quantized models got downloaded tomodels/7B/ggml-model-f16.gguf -
by default the folder
models/7Bdoesn't exist and these commands meant to simplify the experience might puzzle first-time users (compare to ollama)
(the only benefit I see to the current behaviour is for people who have profuse bandwidth and a very small hard drive)
I propose to turn main & server's -m's default to models/$( basename $model_url ) if -mu (or -hff) is set, and to the legacy models/7B/ggml-model-f16.gguf otherwise.
Happy to send a PR if there's a consensus.