-
Notifications
You must be signed in to change notification settings - Fork 52
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
There seems to be an issue with BitBlas when using multi-gpu, even allocating a new array causes this problem:
In [8]: model_loaded.model.layers[0].self_attn.q_proj.device
Out[8]: 0
In [11]: model_loaded.model.layers[16].self_attn.q_proj.device
Out[11]: 1
In [12]: l = model_loaded.model.layers[16].self_attn.q_proj
In [13]: x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)
In [14]: out = l(x) #Runs OK
In [15]: l = model_loaded.model.layers[0].self_attn.q_proj
In [16]: x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[16], line 1
----> 1 x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.### Tasks
- [x] revert the transform weight function implementation of BitBLAS Matmul
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working