Skip to content

CUDA error: an illegal memory access was encountered when using BitBlas on multiple GPUs #154

@mobicham

Description

@mobicham

There seems to be an issue with BitBlas when using multi-gpu, even allocating a new array causes this problem:

In [8]: model_loaded.model.layers[0].self_attn.q_proj.device
Out[8]: 0

In [11]: model_loaded.model.layers[16].self_attn.q_proj.device
Out[11]: 1

In [12]: l = model_loaded.model.layers[16].self_attn.q_proj
In [13]: x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)
In [14]: out = l(x) #Runs OK

In [15]: l = model_loaded.model.layers[0].self_attn.q_proj
In [16]: x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 1
----> 1 x = torch.randn((1, 4096), device=l.device, dtype=torch.float16)

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
### Tasks
- [x] revert the transform weight function implementation of BitBLAS Matmul

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions