CUDA Error 801: Operation Not Supported

I encountered a CUDA error while running a script that uses the Llama model. The error message is “CUDA error 801 at ggml-cuda.cu:6799: operation not supported”. The current device is 0.

Code Snippet:

  def question(message):
      # LLM setup
      llm = Llama(model_path="./japanese-stablelm-instruct-gamma-7b-q8_0.gguf", 
                  n_gpu_layers=32)
      
      # Run inference
      output = llm(
          prompt,
          temperature=1,
          top_p=0.95,
          stop=["指示:", "入力:", "応答:"],
          echo=False,
          max_tokens=1024
      )

Error Message:
llm_load_tensors: ggml ctx size =    0.11 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  132.92 MB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 35/35 layers to GPU
llm_load_tensors: VRAM used: 7205.83 MB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 64.00 MB
llama_new_context_with_model: kv self size  =   64.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 79.63 MB
llama_new_context_with_model: VRAM scratch buffer: 73.00 MB
llama_new_context_with_model: total VRAM used: 7342.83 MB (model: 7205.83 MB, context: 137.00 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 

CUDA error 801 at ggml-cuda.cu:6799: operation not supported
current device: 0

Environment:

NVIDIA-SMI 545.23.06
Driver Version: 545.23.06
CUDA Version: 12.3
GPU: Nvidia Quadro M4000 8GB
Any help in resolving this issue would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Error 801: Operation Not Supported #870

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA Error 801: Operation Not Supported #870

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions