forked from bitsandbytes-foundation/bitsandbytes
    
        
        - 
        Couldn't load subscription status. 
- Fork 12
Ifu rocm enabled 09 23 2025 #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
                
     Merged
            
            
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    …ion#1683) * Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py --------- Co-authored-by: MISHANMAURYA <[email protected]> Co-authored-by: MISHANMAUYRA <[email protected]> Co-authored-by: amcamd <[email protected]> Co-authored-by: Prasanth Nunna <[email protected]>
* Add CUDA 12.9 to build/test workflows * Downgrade Jimver/cuda-toolkit to v0.2.24 * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update tests.yml * Update tests.yml
Signed-off-by: jiqing-feng <[email protected]>
* Add torch 2.8 rc / 2.9 nightly to tests * Update tests.yml * Update tests.yml
…ation#1512) * Automatically call CMake as part of PEP 517 build Call CMake and build the CPU extension when invoking the build via a PEP 517 backend, to ensure that at least some extension is built when users are building from source. This improves consistency with other Python packages, and reduces the risk of accidents. We are using `scikit-build-core` setuptools plugin to take care of CMake dependencies and call into CMake. However, we need to modify the `build_py` command to ensure that CMake is called prior to the setuptools command, as otherwise the newly built shared library won't be picked up by `build_py`. Since setuptools is still responsible for collecting the Python package, it also collects all other shared libraries that were built earlier, for example via manual CMake calls as done in the CI pipeline. Furthermore, if the user does not have `scikit-build-core` installed and calls `setup.py` directly, we output a warning but continue working as before. The logic can be further extended in the future, for example to detect the best COMPUTE_BACKEND default. Fixes bitsandbytes-foundation#1511 * Include C sources and build files in source distribution * Fix formatting
Signed-off-by: jiqing-feng <[email protected]>
…/inf_benchmark [XPU] Add inference benchmark for XPU
…/8bit_int Add kernel registration for 8bit and 32bit optimizers
…ndation/add-funding Create FUNDING.yml
…ndation/adjust-cuda-build Add Volta support in cu128/cu129 builds
Signed-off-by: cyy <[email protected]>
* Fix unused variable warnings and other ruff warnings Signed-off-by: cyy <[email protected]> * Fix format Signed-off-by: cyy <[email protected]> --------- Signed-off-by: cyy <[email protected]>
* add int mm for xpu after torch 2.9 Signed-off-by: jiqing-feng <[email protected]> * add packaging on pyproject Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>
…foundation#1728) * for intel xpu case, use MatMul8bitFp even not use ipex Signed-off-by: Liu, Kaixuan <[email protected]> * fix lint issue Signed-off-by: Liu, Kaixuan <[email protected]> --------- Signed-off-by: Liu, Kaixuan <[email protected]>
…on#1720) * Add parametrize util for targeting parameters outside of nn.Linear modules * Parametrize 4bit: replace existing prequantized weight * cleanup * Add caching for parametrization * Add tests * Fix tests * Guard for torch < 2.5 * Guard for torch < 2.5 * Another test gaurd for torch >= 2.5
* Test suite improvements for MPS/XPU/HPU * Skip test on torch==2.8.0+cpu for Windows regression
…#1710) * Implemented 32bit optimizers in triton * Modify Comments * Optimizing pure torch implementation * Restore the order of parameters and modify the position of pure pytorch implementation * Restore files permissions --------- Co-authored-by: Fanli Lin <[email protected]>
* Add SYCL Kernels for XPU backend * fix transpose Signed-off-by: jiqing-feng <[email protected]> * fix log and format Signed-off-by: jiqing-feng <[email protected]> * revert cpu changes Signed-off-by: jiqing-feng <[email protected]> * clean ipex_xpu Signed-off-by: jiqing-feng <[email protected]> * clean ipex import Signed-off-by: jiqing-feng <[email protected]> * fix ipex cpu import Signed-off-by: jiqing-feng <[email protected]> * fix typo Signed-off-by: jiqing-feng <[email protected]> * fix comments Signed-off-by: jiqing-feng <[email protected]> * refine gemv_4bit kernel * enable FP4 for dequant_4bit and gemv_4bit * refine FP4 dequantization performance * remove check for better performance Signed-off-by: jiqing-feng <[email protected]> * fix doc Signed-off-by: jiqing-feng <[email protected]> * clean code * fix tests Signed-off-by: jiqing-feng <[email protected]> * rm comments Signed-off-by: jiqing-feng <[email protected]> * fix memory issue * fix ut failure * adjust threshold Signed-off-by: jiqing-feng <[email protected]> * fix xpu check Signed-off-by: jiqing-feng <[email protected]> * change test_functional check Signed-off-by: jiqing-feng <[email protected]> * fix test_module Signed-off-by: jiqing-feng <[email protected]> * fix device check Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> * Enable Windows build and refine code * fix xpu log Signed-off-by: jiqing-feng <[email protected]> * remove ipex entirely Signed-off-by: jiqing-feng <[email protected]> * fix cpu int8 CB Signed-off-by: jiqing-feng <[email protected]> * fix lint Signed-off-by: jiqing-feng <[email protected]> * fix logs (#12) * fix logs Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * Fix sycl lint error and tests (#13) * fix sycl nd Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * skip typo check for xpu kernel codes (#14) * skip test for xpu ops Signed-off-by: jiqing-feng <[email protected]> * fix lint Signed-off-by: jiqing-feng <[email protected]> * skip typo for xpu Signed-off-by: jiqing-feng <[email protected]> * skip Signed-off-by: jiqing-feng <[email protected]> * skip Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> * register triton kernel for quantization (#15) Signed-off-by: jiqing-feng <[email protected]> * Fix version comparison issue (#18) # Description The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string # Error message ``` The 8-bit optimizer is not available on your device, only available on CUDA for now. 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. Traceback (most recent call last): File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module> import unsloth File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module> from .models import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module> from .llama import FastLlamaModel File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module> from ._utils import * File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module> from unsloth_zoo.patching_utils import ( File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module> import transformers.integrations.bitsandbytes File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module> import bitsandbytes as bnb File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module> from .backends.xpu import ops as xpu_ops File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module> if version.parse(torch.__version__).release >= version.parse("2.9"): TypeError: '>=' not supported between instances of 'tuple' and 'Version' ``` --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: Er-Xin (Edwin) Shang <[email protected]>
…1692) * implemented 8bit optimizers * Add interface * Commented out torch checks * Merged * Updated kernels * Reused code for quant/dequant * Removed empty line * Changed Readme
* Bump minimum PyTorch to 2.3 * Tests: Fix Windows numpy<2 compatibility for torch<2.4.1
…antization (bitsandbytes-foundation#1746) * Added branchless LUT-based dequantization for FP4 and NF4 * Added extra command line options to control reproducibility * Restore FP4 quantization/dequantization order
…#1757) * Add function to reverse 4bit weights for HPU * Fix lint error
              
                    lcskrishna
  
              
              approved these changes
              
                  
                    Sep 24, 2025 
                  
              
              
            
            
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| Skipped unit tests for reference: | 
    
  sstamenk 
      added a commit
        to sstamenk/bitsandbytes
      that referenced
      this pull request
    
      Oct 17, 2025 
    
    
      
  
    
      
    
  
* Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py * warpSize is being made non constexpr in ROCm 7.0 * Merge pull request ROCm#90 from ROCm/IFU-rocm_enabled-09-23-2025 Ifu rocm enabled 09 23 2025 * Fix typo * unskip test_4bit_quant --------- Co-authored-by: MISHANMAURYA <[email protected]> Co-authored-by: MISHANMAUYRA <[email protected]> Co-authored-by: amcamd <[email protected]> Co-authored-by: Prasanth Nunna <[email protected]> Co-authored-by: sstamenk <[email protected]>
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Motivation
Merge latest upstream changes into ROCm fork
Test Plan
Reviewed all unit tests
Test Result
==================================== PASSES ====================================
= 2998 passed, 901 skipped, 184 deselected, 18 xfailed, 763 warnings in 577.31s (0:09:37) =
Submission Checklist