Skip to content

[Bug]: loading squeezellm model #5190

@yuhuixu1993

Description

@yuhuixu1993

Your current environment

I used 0.4.3 version, pip install, cuda vsesion 12.0, A100 GPU

RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED

🐛 Describe the bug

INFO 06-02 03:21:56 model_runner.py:173] Loading model weights took 12.1389 GB
Traceback (most recent call last):
  File "/export/aiops-data/yuhui/wanda/benchmark_throughput_xgen.py", line 402, in <module>
    main(args)
  File "/export/aiops-data/yuhui/wanda/benchmark_throughput_xgen.py", line 221, in main
    elapsed_time = run_vllm(
  File "/export/aiops-data/yuhui/wanda/benchmark_throughput_xgen.py", line 85, in run_vllm
    llm = LLM(
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 118, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 277, in from_engine_args
    engine = cls(
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
    self._initialize_kv_caches()
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 236, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 111, in determine_num_available_blocks
    return self.driver_worker.determine_num_available_blocks()
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/worker/worker.py", line 138, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 927, in profile_run
    self.execute_model(seqs, kv_caches)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 848, in execute_model
    hidden_states = model_executable(**execute_model_kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 360, in forward
    hidden_states = self.model(input_ids, positions, kv_caches,
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 286, in forward
    hidden_states, residual = layer(
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 228, in forward
    hidden_states = self.self_attn(
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 165, in forward
    q, k = self.rotary_emb(positions, q, k)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 155, in forward
    ops.rotary_embedding(positions, query, key, self.head_size,
  File "/miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/_custom_ops.py", line 89, in rotary_embedding
    vllm_ops.rotary_embedding(positions, query, key, head_size, cos_sin_cache,
RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED at "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/torch/include/c10/cuda/impl/CUDAGuardImpl.h":25, please report a bug to PyTorch. 
Exception raised from CUDAGuardImpl at /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/torch/include/c10/cuda/impl/CUDAGuardImpl.h:25 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7d84d8d81d87 in /miniconda/envs/sqllm/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7d84d8d32828 in /miniconda/envs/sqllm/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: rotary_embedding(at::Tensor&, at::Tensor&, at::Tensor&, int, at::Tensor&, bool) + 0xdf5 (0x7d837cd36db5 in /miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/_C.cpython-39-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x9b06a (0x7d837cd5b06a in /miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/_C.cpython-39-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x98263 (0x7d837cd58263 in /miniconda/envs/sqllm/lib/python3.9/site-packages/vllm/_C.cpython-39-x86_64-linux-gnu.so)
frame #5: python() [0x507397]
<omitting python frames>
frame #8: python() [0x4f80b3]
frame #10: python() [0x4e69da]
frame #12: python() [0x505131]
frame #14: python() [0x4e69da]
frame #16: python() [0x505131]
frame #18: python() [0x4e69da]
frame #21: python() [0x5cb113]
frame #24: python() [0x4e69da]
frame #25: python() [0x50509d]
frame #28: python() [0x4e69da]
frame #29: python() [0x50509d]
frame #32: python() [0x4e69da]
frame #36: python() [0x5cb113]
frame #39: python() [0x4f80b3]
frame #40: python() [0x505131]
frame #42: python() [0x4e69da]
frame #44: python() [0x505131]
frame #46: python() [0x4e69da]
frame #49: python() [0x5cb113]
frame #52: python() [0x4e69da]
frame #54: python() [0x505131]
frame #56: python() [0x4e69da]
frame #58: python() [0x505131]
frame #60: python() [0x4e69da]
frame #63: python() [0x5cb113]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions