CUDA error: an illegal memory access was encountered when using BitBlas on multiple GPUs

There seems to be an issue with BitBlas when using multi-gpu, even allocating a new array causes this problem:

```Python
In [8]: model_loaded.model.layers[0].self_attn.q_proj.device
Out[8]: 0

In [11]: model_loaded.model.layers[16].self_attn.q_proj.device
Out[11]: 1

In [12]: l = model_loaded.model.layers[16].self_attn.q_proj
In [13]: x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)
In [14]: out = l(x) #Runs OK

In [15]: l = model_loaded.model.layers[0].self_attn.q_proj
In [16]: x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 1
----> 1 x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

```
```[tasklist]
### Tasks
- [x] revert the transform weight function implementation of BitBLAS Matmul
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA error: an illegal memory access was encountered when using BitBlas on multiple GPUs #154

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA error: an illegal memory access was encountered when using BitBlas on multiple GPUs #154

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions