Enable bitsandbytes quantization on AMD GPUs that use warp size 32 #27307

sstamenk · 2025-10-21T23:40:35Z

Purpose

Adds support for bitsandbytes quantized models and Unsloth QLoRA on non-Instinct AMD GPUs that utilize warp size 32.
Requires bitsandbytes #1748 in order to work.

Test Plan

Running models/quantization/test_bitsandbytes.py tests

Test Result

Curently: 7 failed, 1 passed, 4 skipped, 5 warnings

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-10-23T11:29:22Z

Documentation preview: https://vllm--27307.org.readthedocs.build/en/27307/

mergify · 2025-10-23T11:29:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sstamenk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: sstamenk <[email protected]>

mergify bot added the rocm Related to AMD ROCm label Oct 21, 2025

sstamenk force-pushed the enable_bitsandbytes_quant_rocm branch from c2fb252 to 90beac1 Compare October 23, 2025 11:28

mergify bot added documentation Improvements or additions to documentation ci/build deepseek Related to DeepSeek models frontend structured-output v1 tpu Related to Google TPUs labels Oct 23, 2025

github-project-automation bot added this to Structured Output Oct 23, 2025

mergify bot added the needs-rebase label Oct 23, 2025

mergify bot assigned sangstar Oct 23, 2025

mergify bot added the kv-connector label Oct 23, 2025

Enable bitsandbytes quantization on warp size 32 AMD GPUs

6a06234

Signed-off-by: sstamenk <[email protected]>

sstamenk force-pushed the enable_bitsandbytes_quant_rocm branch from 90beac1 to 6a06234 Compare October 23, 2025 11:36

mergify bot removed tpu Related to Google TPUs needs-rebase labels Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable bitsandbytes quantization on AMD GPUs that use warp size 32 #27307

Enable bitsandbytes quantization on AMD GPUs that use warp size 32 #27307

sstamenk commented Oct 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Enable bitsandbytes quantization on AMD GPUs that use warp size 32 #27307

Are you sure you want to change the base?

Enable bitsandbytes quantization on AMD GPUs that use warp size 32 #27307

Conversation

sstamenk commented Oct 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Curently: 7 failed, 1 passed, 4 skipped, 5 warnings

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sstamenk commented Oct 21, 2025 •

edited by github-actions bot

Loading