-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[bitsandbytes]: support read bnb pre-quantized model #5753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
69decab to
95dabaa
Compare
|
The changes from @thesues improves my previous work on QLoRA & Bnb (#4776). It now supports models whose weights are published as bnb-quantized. Other than that, it also cleaned my previous code and fixed a bug ( the previous version will run into error in scenarios such as GQA). The changes looks good to me. Could you also take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fond of the term "prequant" here, could it be something along the lines of "quantized_checkpoint"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What isn't supported about it? It seems like no exception is being thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a typo here, it should be "only quant_state.bitsandbytes__nf4 is supported". other libraries such as hf transformer supports quant_state.bitsandbytes__fp4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on pre_quant, it should be talking about the checkpoint being quantized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the improvements and ability to natively load! I think it would be great to followup with a documentation page in the quantization section to show how to deploy bnb models directly in vLLM, perhaps straight from a quick finetune in unsloth.
fb0a6d2 to
b7c3aae
Compare
sure, I added a very simple bnb.rst in docs |
|
@mgoin can you review this version? |
docs/source/quantization/bnb.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant commma.
|
Is this PR good to merge now? It would be really great if we could get it in before the next scheduled release (#6433)! |
8e891bb to
bb014ef
Compare
SGTM |
is there anything I could do to improve this patch? |
|
Any ETA for this feature to be merged? Really keen to use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pinging, LGTM with a small docs fix
As I mentioned earlier, Look Good To Me! |
|
@thesues do you think you could resolve the new conflicts? |
|
I've just tried installing from this PR's branch and testing it out, this solution worked for me! used to produce issues like: but now model loading seems to work great: |
|
It would be great if someone could validate that the new Llama 3.1 8B BNB checkpoint loads: https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4 |
Meta-Llama-3.1-8B-Instruct-BNB-NF4 works as expected. logs: |
…el/omost-llama-3-8b-4bits
Co-authored-by: Michael Goin <[email protected]>
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for testing further and sticking with this, LGTM!
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Alvant <[email protected]>
Co-authored-by: Michael Goin <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>
huggingface is bitsandbytes pre quantized model such as
...
this will support these pre quantized for vllm
@chenqianfzh @Yard1 @jeejeelee