-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Labels
releaseRelated to new version releaseRelated to new version release
Description
We will make a new release as soon as these PRs are merged.
- [Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper scaled mm #12696
- [VLM] Qwen2.5-VL #12604
- [VLM] Add MLA with pure RoPE support for deepseek-vl2 models #12729
- [Perf] Mem align KV caches for CUDA devices (MLA perf improvement) #12676
- [core][distributed] exact ray placement control #12732
- [Bugfix] Better FP8 supported defaults #12796
Metadata
Metadata
Assignees
Labels
releaseRelated to new version releaseRelated to new version release