📚 The doc issue
I'm confused what "the quantization method is supported" mean? Ampere arch doesn't support FP8, according to Nvidia. So does this mean the FP8 operation is supported on A100/A800 GPU? Or just we can conver the weight parameters form FP16 to FP8?
Suggest a potential alternative/fix
No response