Skip to content
Discussion options

You must be logged in to vote

@zzh-www I have reverted the auto bf16 code. You can still run in bf16 by passing torch_dtype override. However, based on my full kernel output testing, using BF16 (even with marlin kernel) has 2x to 10x the degradtion in raw accuracy drift unless you use the slower triton/torch kernerls. This is very bad for model accuracy. I fully recommend running the model in FP16 mode even in vLLM.

Replies: 7 comments 9 replies

Comment options

You must be logged in to vote
1 reply
@zzh-www
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@zzh-www
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
4 replies
@zzh-www
Comment options

@Qubitium
Comment options

@zzh-www
Comment options

@Qubitium
Comment options

Answer selected by zzh-www
Comment options

You must be logged in to vote
3 replies
@zzh-www
Comment options

@zzh-www
Comment options

@Qubitium
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants