Ifu rocm enabled 09 23 2025 #90

pnunna93 · 2025-09-23T17:17:03Z

Motivation

Merge latest upstream changes into ROCm fork

Test Plan

Reviewed all unit tests

Test Result

==================================== PASSES ====================================
= 2998 passed, 901 skipped, 184 deselected, 18 xfailed, 763 warnings in 577.31s (0:09:37) =

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…ion#1683) * Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py --------- Co-authored-by: MISHANMAURYA <[email protected]> Co-authored-by: MISHANMAUYRA <[email protected]> Co-authored-by: amcamd <[email protected]> Co-authored-by: Prasanth Nunna <[email protected]>

* Add CUDA 12.9 to build/test workflows * Downgrade Jimver/cuda-toolkit to v0.2.24 * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update tests.yml * Update tests.yml

Signed-off-by: jiqing-feng <[email protected]>

* Add torch 2.8 rc / 2.9 nightly to tests * Update tests.yml * Update tests.yml

…ation#1512) * Automatically call CMake as part of PEP 517 build Call CMake and build the CPU extension when invoking the build via a PEP 517 backend, to ensure that at least some extension is built when users are building from source. This improves consistency with other Python packages, and reduces the risk of accidents. We are using `scikit-build-core` setuptools plugin to take care of CMake dependencies and call into CMake. However, we need to modify the `build_py` command to ensure that CMake is called prior to the setuptools command, as otherwise the newly built shared library won't be picked up by `build_py`. Since setuptools is still responsible for collecting the Python package, it also collects all other shared libraries that were built earlier, for example via manual CMake calls as done in the CI pipeline. Furthermore, if the user does not have `scikit-build-core` installed and calls `setup.py` directly, we output a warning but continue working as before. The logic can be further extended in the future, for example to detect the best COMPUTE_BACKEND default. Fixes bitsandbytes-foundation#1511 * Include C sources and build files in source distribution * Fix formatting

Signed-off-by: jiqing-feng <[email protected]>

fix log

…/inf_benchmark [XPU] Add inference benchmark for XPU

…/8bit_int Add kernel registration for 8bit and 32bit optimizers

…ndation/add-funding Create FUNDING.yml

…ndation/adjust-cuda-build Add Volta support in cu128/cu129 builds

Signed-off-by: cyy <[email protected]>

* Fix unused variable warnings and other ruff warnings Signed-off-by: cyy <[email protected]> * Fix format Signed-off-by: cyy <[email protected]> --------- Signed-off-by: cyy <[email protected]>

* add int mm for xpu after torch 2.9 Signed-off-by: jiqing-feng <[email protected]> * add packaging on pyproject Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>

…foundation#1728) * for intel xpu case, use MatMul8bitFp even not use ipex Signed-off-by: Liu, Kaixuan <[email protected]> * fix lint issue Signed-off-by: Liu, Kaixuan <[email protected]> --------- Signed-off-by: Liu, Kaixuan <[email protected]>

…on#1720) * Add parametrize util for targeting parameters outside of nn.Linear modules * Parametrize 4bit: replace existing prequantized weight * cleanup * Add caching for parametrization * Add tests * Fix tests * Guard for torch < 2.5 * Guard for torch < 2.5 * Another test gaurd for torch >= 2.5

…-foundation#1749)

* Test suite improvements for MPS/XPU/HPU * Skip test on torch==2.8.0+cpu for Windows regression

…#1710) * Implemented 32bit optimizers in triton * Modify Comments * Optimizing pure torch implementation * Restore the order of parameters and modify the position of pure pytorch implementation * Restore files permissions --------- Co-authored-by: Fanli Lin <[email protected]>

* Add SYCL Kernels for XPU backend * fix transpose Signed-off-by: jiqing-feng <[email protected]> * fix log and format Signed-off-by: jiqing-feng <[email protected]> * revert cpu changes Signed-off-by: jiqing-feng <[email protected]> * clean ipex_xpu Signed-off-by: jiqing-feng <[email protected]> * clean ipex import Signed-off-by: jiqing-feng <[email protected]> * fix ipex cpu import Signed-off-by: jiqing-feng <[email protected]> * fix typo Signed-off-by: jiqing-feng <[email protected]> * fix comments Signed-off-by: jiqing-feng <[email protected]> * refine gemv_4bit kernel * enable FP4 for dequant_4bit and gemv_4bit * refine FP4 dequantization performance * remove check for better performance Signed-off-by: jiqing-feng <[email protected]> * fix doc Signed-off-by: jiqing-feng <[email protected]> * clean code * fix tests Signed-off-by: jiqing-feng <[email protected]> * rm comments Signed-off-by: jiqing-feng <[email protected]> * fix memory issue * fix ut failure * adjust threshold Signed-off-by: jiqing-feng <[email protected]> * fix xpu check Signed-off-by: jiqing-feng <[email protected]> * change test_functional check Signed-off-by: jiqing-feng <[email protected]> * fix test_module Signed-off-by: jiqing-feng <[email protected]> * fix device check Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> * Enable Windows build and refine code * fix xpu log Signed-off-by: jiqing-feng <[email protected]> * remove ipex entirely Signed-off-by: jiqing-feng <[email protected]> * fix cpu int8 CB Signed-off-by: jiqing-feng <[email protected]> * fix lint Signed-off-by: jiqing-feng <[email protected]> * fix logs (#12) * fix logs Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * Fix sycl lint error and tests (#13) * fix sycl nd Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * skip typo check for xpu kernel codes (#14) * skip test for xpu ops Signed-off-by: jiqing-feng <[email protected]> * fix lint Signed-off-by: jiqing-feng <[email protected]> * skip typo for xpu Signed-off-by: jiqing-feng <[email protected]> * skip Signed-off-by: jiqing-feng <[email protected]> * skip Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * register triton kernel for quantization (#15) Signed-off-by: jiqing-feng <[email protected]> * Fix version comparison issue (#18) # Description The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string # Error message ``` The 8-bit optimizer is not available on your device, only available on CUDA for now. 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. Traceback (most recent call last): File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module> import unsloth File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module> from .models import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module> from .llama import FastLlamaModel File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module> from ._utils import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module> from unsloth_zoo.patching_utils import ( File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module> import transformers.integrations.bitsandbytes File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module> import bitsandbytes as bnb File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module> from .backends.xpu import ops as xpu_ops File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module> if version.parse(torch.__version__).release >= version.parse("2.9"): TypeError: '>=' not supported between instances of 'tuple' and 'Version' ``` --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: Er-Xin (Edwin) Shang <[email protected]>

…1692) * implemented 8bit optimizers * Add interface * Commented out torch checks * Merged * Updated kernels * Reused code for quant/dequant * Removed empty line * Changed Readme

…1755)

* Bump minimum PyTorch to 2.3 * Tests: Fix Windows numpy<2 compatibility for torch<2.4.1

…antization (bitsandbytes-foundation#1746) * Added branchless LUT-based dequantization for FP4 and NF4 * Added extra command line options to control reproducibility * Restore FP4 quantization/dequantization order

…#1757) * Add function to reverse 4bit weights for HPU * Fix lint error

lcskrishna

LGTM.

lcskrishna · 2025-09-24T02:45:32Z

Skipped unit tests for reference:
359d545

* Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py * warpSize is being made non constexpr in ROCm 7.0 * Merge pull request ROCm#90 from ROCm/IFU-rocm_enabled-09-23-2025 Ifu rocm enabled 09 23 2025 * Fix typo * unskip test_4bit_quant --------- Co-authored-by: MISHANMAURYA <[email protected]> Co-authored-by: MISHANMAUYRA <[email protected]> Co-authored-by: amcamd <[email protected]> Co-authored-by: Prasanth Nunna <[email protected]> Co-authored-by: sstamenk <[email protected]>

pnunna93 and others added 30 commits June 20, 2025 13:18

Fix AdamW documentation (bitsandbytes-foundation#1686)

fd2949a

Make minor improvements to optimizer.py (bitsandbytes-foundation#1687)

aca9778

Add CUDA 12.9 build (bitsandbytes-foundation#1689)

1abd5e7

* Add CUDA 12.9 to build/test workflows * Downgrade Jimver/cuda-toolkit to v0.2.24 * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update tests.yml * Update tests.yml

Temporarily disable HPU tests

6d0a5cd

fix triton kernel on the correct device (bitsandbytes-foundation#1691)

bdcee0f

Signed-off-by: jiqing-feng <[email protected]>

Update README.md

e28d4d9

CI: Test with PyTorch 2.8.0 RC (bitsandbytes-foundation#1693)

ed398d2

* Add torch 2.8 rc / 2.9 nightly to tests * Update tests.yml * Update tests.yml

Added inference benchmark

3278614

fix log

ea4b59f

Signed-off-by: jiqing-feng <[email protected]>

Merge pull request bitsandbytes-foundation#1697 from jiqing-feng/log

ee01736

fix log

Merge pull request bitsandbytes-foundation#1696 from Egor-Krivov/egor…

adc7fda

…/inf_benchmark [XPU] Add inference benchmark for XPU

Add interface for 8bit optimizer

b43edf5

Fixed bugs

35ce337

enabled tests

abf4a1e

Add 32bit optimizer interface

3b89a05

Add no_cpu for optimizers

223fea5

Update to kernel registration

4075a64

Reverse lion

236124e

Changed number of errors

36f5c4f

Removed cpu

24d9139

Added mutated args to the schema

e33ba1c

Fixed default args

0f6fe6b

Merge pull request bitsandbytes-foundation#1706 from Egor-Krivov/egor…

941681d

…/8bit_int Add kernel registration for 8bit and 32bit optimizers

Test fix

14147f6

Create FUNDING.yml

df67c70

Merge pull request bitsandbytes-foundation#1714 from bitsandbytes-fou…

33449ee

…ndation/add-funding Create FUNDING.yml

Add Volta support in cu128/cu129 builds

ec19229

Merge pull request bitsandbytes-foundation#1715 from bitsandbytes-fou…

e54dc12

…ndation/adjust-cuda-build Add Volta support in cu128/cu129 builds

matthewdouglas and others added 22 commits August 11, 2025 15:00

Bump dev version

9088107

Restore temporary changes from release

7bfe923

add py.typed (bitsandbytes-foundation#1726)

ff389db

Signed-off-by: cyy <[email protected]>

Enable F841 (bitsandbytes-foundation#1727)

c76e208

* Fix unused variable warnings and other ruff warnings Signed-off-by: cyy <[email protected]> * Fix format Signed-off-by: cyy <[email protected]> --------- Signed-off-by: cyy <[email protected]>

add int mm for xpu after torch 2.9 (bitsandbytes-foundation#1736)

a09d05a

* add int mm for xpu after torch 2.9 Signed-off-by: jiqing-feng <[email protected]> * add packaging on pyproject Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>

Adjust 4bit test tolerance on CPU for larger blocksizes (bitsandbytes…

d731fc4

…-foundation#1749)

Test improvements (bitsandbytes-foundation#1750)

6a07ffe

* Test suite improvements for MPS/XPU/HPU * Skip test on torch==2.8.0+cpu for Windows regression

Lint fix

d848d4d

Lint fix

4b02574

[XPU] Implemented 8bit optimizers in triton (bitsandbytes-foundation#…

404e277

…1692) * implemented 8bit optimizers * Add interface * Commented out torch checks * Merged * Updated kernels * Reused code for quant/dequant * Removed empty line * Changed Readme

Drop Maxwell (sm50) build from distribution (bitsandbytes-foundation#…

dd1929b

…1755)

Bump minimum PyTorch to 2.3 (bitsandbytes-foundation#1754)

c9bce2b

* Bump minimum PyTorch to 2.3 * Tests: Fix Windows numpy<2 compatibility for torch<2.4.1

Update log (bitsandbytes-foundation#1758)

b2a8a15

Add function to reverse 4bit weights for HPU (bitsandbytes-foundation…

2adcb7a

…#1757) * Add function to reverse 4bit weights for HPU * Fix lint error

Update README.md

e817036

Merge upstream/main into ROCm/rocm_enabled

0507a45

Skip unsupported tests on ROCm

359d545

pnunna93 requested a review from lcskrishna September 23, 2025 17:17

pnunna93 added 3 commits September 23, 2025 18:39

update kernels.hip with latest upstream

9f74744

Import missing modules

7ba4fb4

Fix lint errors.

36da3e1

lcskrishna approved these changes Sep 24, 2025

View reviewed changes

lcskrishna merged commit 2e65b38 into rocm_enabled Sep 24, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ifu rocm enabled 09 23 2025 #90

Ifu rocm enabled 09 23 2025 #90

Uh oh!

pnunna93 commented Sep 23, 2025

Uh oh!

lcskrishna left a comment

Uh oh!

lcskrishna commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Uh oh!

Ifu rocm enabled 09 23 2025 #90

Ifu rocm enabled 09 23 2025 #90

Uh oh!

Conversation

pnunna93 commented Sep 23, 2025

Motivation

Test Plan

Test Result

Submission Checklist

Uh oh!

lcskrishna left a comment

Choose a reason for hiding this comment

Uh oh!

lcskrishna commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants