From the blog I noticed that FlashInfer implements low-precision attention kernels so that we can achieve nearly linear speedup to the compression ratio (~4x for 4bit, ~2x for 8bit). This feature is great! and I try to use it. But there is no demo or toy code about how to use it. Could you please share more details about it?