Skip to content

Conversation

@pnunna93
Copy link
Collaborator

Motivation

Merge latest upstream changes into ROCm fork

Test Plan

Reviewed all unit tests

Test Result

==================================== PASSES ====================================
= 2998 passed, 901 skipped, 184 deselected, 18 xfailed, 763 warnings in 577.31s (0:09:37) =

Submission Checklist

pnunna93 and others added 30 commits June 20, 2025 13:18
…ion#1683)

* Port ROCm changes from multi-backend-refactor branch

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update test_functional.py

* Update test_functional.py

* Update cextension.py

* Update cuda_specs.py

* Update cuda_specs.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_cuda_setup_evaluator.py

* Update test_functional.py

* Update modules.py

* Update modules.py

* Update ops.py

* Update test_linear4bit.py

* Update ops.py

* Update ops.py

* Update test_linear4bit.py

* Update test_linear4bit.py

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Create build-rocm.sh

* Update cuda_specs.py

* Fix trailing whitespace

* Remove conflicts.diff

* update for hipblasVersionMajor >=3

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update main.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update test_linear4bit.py

* Lint

* Lint

* Update helpers.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Lint

* Update pythonInterface.cpp

* lint fix

* lint

* Update pythonInterface.cpp

* revert permissions change

* Fix indentation

* Update kernels_hip.cuh

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update kernels_hip.cuh

* Update kernels.hip

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update ops.hip

* Update CMakeLists.txt

* Update functional.py

* Update cextension.py

* Update cextension.py

---------

Co-authored-by: MISHANMAURYA <[email protected]>
Co-authored-by: MISHANMAUYRA <[email protected]>
Co-authored-by: amcamd <[email protected]>
Co-authored-by: Prasanth Nunna <[email protected]>
* Add CUDA 12.9 to build/test workflows

* Downgrade Jimver/cuda-toolkit to v0.2.24

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update tests.yml

* Update tests.yml
* Add torch 2.8 rc / 2.9 nightly to tests

* Update tests.yml

* Update tests.yml
…ation#1512)

* Automatically call CMake as part of PEP 517 build

Call CMake and build the CPU extension when invoking the build
via a PEP 517 backend, to ensure that at least some extension is built
when users are building from source.  This improves consistency with
other Python packages, and reduces the risk of accidents.

We are using `scikit-build-core` setuptools plugin to take care of CMake
dependencies and call into CMake.  However, we need to modify
the `build_py` command to ensure that CMake is called prior to
the setuptools command, as otherwise the newly built shared library
won't be picked up by `build_py`.

Since setuptools is still responsible for collecting the Python package,
it also collects all other shared libraries that were built earlier,
for example via manual CMake calls as done in the CI pipeline.
Furthermore, if the user does not have `scikit-build-core` installed
and calls `setup.py` directly, we output a warning but continue working
as before.

The logic can be further extended in the future, for example to detect
the best COMPUTE_BACKEND default.

Fixes bitsandbytes-foundation#1511

* Include C sources and build files in source distribution

* Fix formatting
Signed-off-by: jiqing-feng <[email protected]>
…/inf_benchmark

[XPU] Add inference benchmark for XPU
…/8bit_int

Add kernel registration for 8bit and 32bit optimizers
…ndation/adjust-cuda-build

Add Volta support in cu128/cu129 builds
matthewdouglas and others added 22 commits August 11, 2025 15:00
* Fix unused variable warnings and other ruff warnings

Signed-off-by: cyy <[email protected]>

* Fix format

Signed-off-by: cyy <[email protected]>

---------

Signed-off-by: cyy <[email protected]>
* add int mm for xpu after torch 2.9

Signed-off-by: jiqing-feng <[email protected]>

* add packaging on pyproject

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>
…foundation#1728)

* for intel xpu case, use MatMul8bitFp even not use ipex

Signed-off-by: Liu, Kaixuan <[email protected]>

* fix lint issue

Signed-off-by: Liu, Kaixuan <[email protected]>

---------

Signed-off-by: Liu, Kaixuan <[email protected]>
…on#1720)

* Add parametrize util for targeting parameters outside of nn.Linear modules

* Parametrize 4bit: replace existing prequantized weight

* cleanup

* Add caching for parametrization

* Add tests

* Fix tests

* Guard for torch < 2.5

* Guard for torch < 2.5

* Another test gaurd for torch >= 2.5
* Test suite improvements for MPS/XPU/HPU

* Skip test on torch==2.8.0+cpu for Windows regression
…#1710)

* Implemented 32bit optimizers in triton

* Modify Comments

* Optimizing pure torch implementation

* Restore the order of parameters and modify the position of pure pytorch implementation

* Restore files permissions

---------

Co-authored-by: Fanli Lin <[email protected]>
* Add SYCL Kernels for XPU backend

* fix transpose

Signed-off-by: jiqing-feng <[email protected]>

* fix log and format

Signed-off-by: jiqing-feng <[email protected]>

* revert cpu changes

Signed-off-by: jiqing-feng <[email protected]>

* clean ipex_xpu

Signed-off-by: jiqing-feng <[email protected]>

* clean ipex import

Signed-off-by: jiqing-feng <[email protected]>

* fix ipex cpu import

Signed-off-by: jiqing-feng <[email protected]>

* fix typo

Signed-off-by: jiqing-feng <[email protected]>

* fix comments

Signed-off-by: jiqing-feng <[email protected]>

* refine gemv_4bit kernel

* enable FP4 for dequant_4bit and gemv_4bit

* refine FP4 dequantization performance

* remove check for better performance

Signed-off-by: jiqing-feng <[email protected]>

* fix doc

Signed-off-by: jiqing-feng <[email protected]>

* clean code

* fix tests

Signed-off-by: jiqing-feng <[email protected]>

* rm comments

Signed-off-by: jiqing-feng <[email protected]>

* fix memory issue

* fix ut failure

* adjust threshold

Signed-off-by: jiqing-feng <[email protected]>

* fix xpu check

Signed-off-by: jiqing-feng <[email protected]>

* change test_functional check

Signed-off-by: jiqing-feng <[email protected]>

* fix test_module

Signed-off-by: jiqing-feng <[email protected]>

* fix device check

Signed-off-by: jiqing-feng <[email protected]>

* fix tests

Signed-off-by: jiqing-feng <[email protected]>

* Enable Windows build and refine code

* fix xpu log

Signed-off-by: jiqing-feng <[email protected]>

* remove ipex entirely

Signed-off-by: jiqing-feng <[email protected]>

* fix cpu int8 CB

Signed-off-by: jiqing-feng <[email protected]>

* fix lint

Signed-off-by: jiqing-feng <[email protected]>

* fix logs (#12)

* fix logs

Signed-off-by: jiqing-feng <[email protected]>

* fix format

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* Fix sycl lint error and tests (#13)

* fix sycl nd

Signed-off-by: jiqing-feng <[email protected]>

* fix tests

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* skip typo check for xpu kernel codes (#14)

* skip test for xpu ops

Signed-off-by: jiqing-feng <[email protected]>

* fix lint

Signed-off-by: jiqing-feng <[email protected]>

* skip typo for xpu

Signed-off-by: jiqing-feng <[email protected]>

* skip

Signed-off-by: jiqing-feng <[email protected]>

* skip

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* register triton kernel for quantization (#15)

Signed-off-by: jiqing-feng <[email protected]>

* Fix version comparison issue (#18)

# Description

The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string

# Error message
```
The 8-bit optimizer is not available on your device, only available on CUDA for now.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module>
    import unsloth
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module>
    from .models import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
    from .llama     import FastLlamaModel
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module>
    from ._utils import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module>
    from unsloth_zoo.patching_utils import (
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module>
    import transformers.integrations.bitsandbytes
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module>
    import bitsandbytes as bnb
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module>
    from .backends.xpu import ops as xpu_ops
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module>
    if version.parse(torch.__version__).release >= version.parse("2.9"):
TypeError: '>=' not supported between instances of 'tuple' and 'Version'
```

---------

Signed-off-by: jiqing-feng <[email protected]>
Co-authored-by: jiqing-feng <[email protected]>
Co-authored-by: Er-Xin (Edwin) Shang <[email protected]>
…1692)

* implemented 8bit optimizers

* Add interface

* Commented out torch checks

* Merged

* Updated kernels

* Reused code for quant/dequant

* Removed empty line

* Changed Readme
* Bump minimum PyTorch to 2.3

* Tests: Fix Windows numpy<2 compatibility for torch<2.4.1
…antization (bitsandbytes-foundation#1746)

* Added branchless LUT-based dequantization for FP4 and NF4

* Added extra command line options to control reproducibility

* Restore FP4 quantization/dequantization order
…#1757)

* Add function to reverse 4bit weights for HPU

* Fix lint error
@pnunna93 pnunna93 requested a review from lcskrishna September 23, 2025 17:17
Copy link

@lcskrishna lcskrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lcskrishna
Copy link

Skipped unit tests for reference:
359d545

@lcskrishna lcskrishna merged commit 2e65b38 into rocm_enabled Sep 24, 2025
47 checks passed
sstamenk added a commit to sstamenk/bitsandbytes that referenced this pull request Oct 17, 2025
* Port ROCm changes from multi-backend-refactor branch

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update test_functional.py

* Update test_functional.py

* Update cextension.py

* Update cuda_specs.py

* Update cuda_specs.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_cuda_setup_evaluator.py

* Update test_functional.py

* Update modules.py

* Update modules.py

* Update ops.py

* Update test_linear4bit.py

* Update ops.py

* Update ops.py

* Update test_linear4bit.py

* Update test_linear4bit.py

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Create build-rocm.sh

* Update cuda_specs.py

* Fix trailing whitespace

* Remove conflicts.diff

* update for hipblasVersionMajor >=3

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update main.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update test_linear4bit.py

* Lint

* Lint

* Update helpers.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Lint

* Update pythonInterface.cpp

* lint fix

* lint

* Update pythonInterface.cpp

* revert permissions change

* Fix indentation

* Update kernels_hip.cuh

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update kernels_hip.cuh

* Update kernels.hip

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update ops.hip

* Update CMakeLists.txt

* Update functional.py

* Update cextension.py

* Update cextension.py

* warpSize is being made non constexpr in ROCm 7.0

* Merge pull request ROCm#90 from ROCm/IFU-rocm_enabled-09-23-2025

Ifu rocm enabled 09 23 2025

* Fix typo

* unskip test_4bit_quant

---------

Co-authored-by: MISHANMAURYA <[email protected]>
Co-authored-by: MISHANMAUYRA <[email protected]>
Co-authored-by: amcamd <[email protected]>
Co-authored-by: Prasanth Nunna <[email protected]>
Co-authored-by: sstamenk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.