You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a FP16 x FP6 mixed matmul kernel optimized for io bound workloads per [FP6-LLM](https://arxiv.org/abs/2401.14112). The actual CUDA kernel is located under [csrc/cuda/fp6_llm/](../../csrc/cuda/fp6_llm/). This module provides helper functions to quantize FP32 weights to FP6 and facility to convert existing models to FP6.
4
+
5
+
## Usage
6
+
7
+
```python
8
+
from torchao.prototype.fp6_llm import convert_fp6_llm
9
+
10
+
model =...
11
+
convert_fp6_llm(model) # convert model in-place, replacing nn.Linear modules with Fp6LlmLinear
0 commit comments