- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 790
Add SYCL Kernels for XPU backend #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fix transpose
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
revert cpu changes
Signed-off-by: jiqing-feng <[email protected]>
remove check for better performance
Signed-off-by: jiqing-feng <[email protected]>
| Can we use a more accurate title for the commit? or reviewers would get confused if all SYCL kernels are included in the PR. | 
* fix sycl nd Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>
| 
  | 
|  | ||
|  | ||
| # SYCL should be faster for xpu, so at first checking if it is available. | ||
| if not isinstance(lib, ErrorHandlerMockBNBNativeLibrary): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently you either pick all methods from SYCL or all methods from triton. However, sycl implementation right now is missing these methods, available in triton:
quantize_blockwize
quantize_4bit
I suggest we keep using these triton methods even with SYCL, since that's the only option on XPU for new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two kernels don't affect the performance of QLoRA, they are now default running with pytorch ops and we will implemented them with SYCL kernel later.
| The implementation is missing following methods: 
  | 
| Hi @Egor-Krivov . Could you share your script to get this error? | 
| 
 @Egor-Krivov , these kernels have been implemented already. 
 @Egor-Krivov, these kernels already implemented with SYCL kernel. | 
| Hi @matthewdouglas . Could you please trigger the CI for this PR? Thanks! | 
| This PR is ready for review now, please reach us if there is any other question, thanks! | 
| 
 I'm working on performance testing of unsloth right now. These methods are used for CUDA implementation here: I am working with POC branch (not merged to upstream) from https://github.com/leizhenyuan/unsloth/blob/7bed913255f611e220c2d219ee988c179ed98033/unsloth/kernels/utils.py#L154 For me the call happens in the last 2 lines of my script, which is essentially a copy of unsloth tutorial:  | 
| The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. | 
| Hi @matthewdouglas . The lint test failed with error fix. See this comment. Do you know how to skip xpu kernels on typo test? | 
* skip test for xpu ops Signed-off-by: jiqing-feng <[email protected]> * fix lint Signed-off-by: jiqing-feng <[email protected]> * skip typo for xpu Signed-off-by: jiqing-feng <[email protected]> * skip Signed-off-by: jiqing-feng <[email protected]> * skip Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
| Hi @matthewdouglas . Please trigger the tests and review this PR. Thanks! | 
Signed-off-by: jiqing-feng <[email protected]>
# Description
The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string
# Error message
```
The 8-bit optimizer is not available on your device, only available on CUDA for now.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module>
    import unsloth
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module>
    from .models import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
    from .llama     import FastLlamaModel
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module>
    from ._utils import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module>
    from unsloth_zoo.patching_utils import (
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module>
    import transformers.integrations.bitsandbytes
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module>
    import bitsandbytes as bnb
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module>
    from .backends.xpu import ops as xpu_ops
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module>
    if version.parse(torch.__version__).release >= version.parse("2.9"):
TypeError: '>=' not supported between instances of 'tuple' and 'Version'
```
    | Hi all, There's a few small lint issues to fix (I'll take care of it!) Apart from that, it would be great if we could add the XPU backend build to our existing workflow in  Thanks! | 
| As discussed on Slack, we can follow up with separate PRs for things like packaging. | 
1813b05
      into
      
  
    bitsandbytes-foundation:main
  
    
This is the pull request for the SYCL Kernels targeting the XPU backend.