-
-
Couldn't load subscription status.
- Fork 10.8k
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
Model Input Dumps
No response
🐛 Describe the bug
I trying to start Qwen/Qwen2-VL-72B-Instruct-AWQ model with docker vllm. Server with 5 rtx 3090ti. Llama 70b and other models works fine. I have exception at downloading process. The same problem raises with Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 model. Vllm is latest v0.6.2.
Strat in docker:
sudo docker run --ipc=host --log-opt max-size=10m --log-opt max-file=1 --rm -it --gpus '"device=1,2,3,4"' -p 9000:8000 --mount type=bind,source=/home/me/.cache,target=/root/.cache vllm/vllm-openai:v0.6.2 --model Qwen/Qwen2-VL-72B-Instruct-AWQ --tensor-parallel-size 4 --gpu-memory-utilization 0.92 --max-model-len 8000 --dtype half -q awq --disable-log-requests
Exception:
INFO 10-05 01:33:20 api_server.py:177] Started engine process with PID 77
config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 1.39k/1.39k [00:00<00:00, 19.0MB/s]
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████████████| 594/594 [00:00<00:00, 2.26MB/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 571, in <module>
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 538, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 182, in build_async_engine_client_from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 874, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 811, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 207, in __init__
self.max_model_len = _get_and_verify_max_len(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 1746, in _get_and_verify_max_len
assert "factor" in rope_scaling
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 388, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 134, in from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 874, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 811, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 207, in __init__
self.max_model_len = _get_and_verify_max_len(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 1746, in _get_and_verify_max_len
assert "factor" in rope_scaling
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.