-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
INFO 09-01 13:49:38 [__init__.py:241] Automatically detected platform cuda.
(APIServer pid=91) INFO 09-01 13:49:38 [api_server.py:1805] vLLM API server version 0.10.1.1
(APIServer pid=91) INFO 09-01 13:49:38 [utils.py:326] non-default args: {'port': 8001, 'uvicorn_log_level': 'error', 'api_key': ['key_123'], 'model': 'Qwen/Qwen2.5-14B-Instruct-1M', 'dtype': 'half', 'max_model_len': 1010000, 'enforce_eager': True, 'served_model_name': ['model'], 'gpu_memory_utilization': 0.8, 'max_num_batched_tokens': 131072, 'max_num_seqs': 32, 'enable_chunked_prefill': True}
(APIServer pid=91) INFO 09-01 13:49:43 [__init__.py:711] Resolved architecture: Qwen2ForCausalLM
(APIServer pid=91) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=91) WARNING 09-01 13:49:43 [__init__.py:2819] Casting torch.bfloat16 to torch.float16.
(APIServer pid=91) INFO 09-01 13:49:43 [__init__.py:1750] Using max model len 1010000
(APIServer pid=91) INFO 09-01 13:49:43 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=131072.
(APIServer pid=91) INFO 09-01 13:49:45 [weight_utils.py:254] Loaded sparse attention config from /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-14B-Instruct-1M/snapshots/620fad32de7bdd2293b3d99b39eba2fe63e97438/sparse_attention_config.json
(APIServer pid=91) INFO 09-01 13:49:45 [__init__.py:3565] Cudagraph is disabled under eager mode
INFO 09-01 13:49:48 [__init__.py:241] Automatically detected platform cuda.
(EngineCore_0 pid=1425) INFO 09-01 13:49:49 [core.py:636] Waiting for init message from front-end.
(EngineCore_0 pid=1425) INFO 09-01 13:49:49 [core.py:74] Initializing a V1 LLM engine (v0.10.1.1) with config: model='Qwen/Qwen2.5-14B-Instruct-1M', speculative_config=None, tokenizer='Qwen/Qwen2.5-14B-Instruct-1M', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1010000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=model, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":0,"local_cache_dir":null}
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [parallel_state.py:1134] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [gpu_model_runner.py:1953] Starting to load model Qwen/Qwen2.5-14B-Instruct-1M...
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [gpu_model_runner.py:1985] Loading model from scratch...
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [cuda.py:328] Using Flash Attention backend on V1 engine.
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] EngineCore failed to start.
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] Traceback (most recent call last):
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 492, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 80, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self._init_executor()
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.collective_rpc("load_model")
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3007, in run_method
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] return func(*args, **kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 212, in load_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1986, in load_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.model = model_loader.load_model(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 465, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.model = Qwen2Model(vllm_config=vllm_config,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 316, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 641, in make_layers
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 318, in <lambda>
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] lambda prefix: decoder_layer_type(config=config,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 216, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.self_attn = Qwen2Attention(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 162, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.attn = Attention(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] File "/app/.venv/lib/python3.12/site-packages/vllm/attention/layer.py", line 175, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] TypeError: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'
(EngineCore_0 pid=1425) Process EngineCore_0:
(EngineCore_0 pid=1425) Traceback (most recent call last):
(EngineCore_0 pid=1425) File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=1425) self.run()
(EngineCore_0 pid=1425) File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=1425) self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 704, in run_engine_core
(EngineCore_0 pid=1425) raise e
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
(EngineCore_0 pid=1425) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 492, in __init__
(EngineCore_0 pid=1425) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 80, in __init__
(EngineCore_0 pid=1425) self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=1425) self._init_executor()
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=1425) self.collective_rpc("load_model")
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=1425) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3007, in run_method
(EngineCore_0 pid=1425) return func(*args, **kwargs)
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 212, in load_model
(EngineCore_0 pid=1425) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1986, in load_model
(EngineCore_0 pid=1425) self.model = model_loader.load_model(
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=1425) model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=1425) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 465, in __init__
(EngineCore_0 pid=1425) self.model = Qwen2Model(vllm_config=vllm_config,
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=1425) old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 316, in __init__
(EngineCore_0 pid=1425) self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_0 pid=1425) ^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 641, in make_layers
(EngineCore_0 pid=1425) maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 318, in <lambda>
(EngineCore_0 pid=1425) lambda prefix: decoder_layer_type(config=config,
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 216, in __init__
(EngineCore_0 pid=1425) self.self_attn = Qwen2Attention(
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 162, in __init__
(EngineCore_0 pid=1425) self.attn = Attention(
(EngineCore_0 pid=1425) ^^^^^^^^^^
(EngineCore_0 pid=1425) File "/app/.venv/lib/python3.12/site-packages/vllm/attention/layer.py", line 175, in __init__
(EngineCore_0 pid=1425) self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
(EngineCore_0 pid=1425) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) TypeError: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'
[rank0]:[W901 13:49:51.134763764 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=91) Traceback (most recent call last):
(APIServer pid=91) File "<frozen runpy>", line 198, in _run_module_as_main
(APIServer pid=91) File "<frozen runpy>", line 88, in _run_code
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1920, in <module>
(APIServer pid=91) uvloop.run(run_server(args))
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=91) return __asyncio.run(
(APIServer pid=91) ^^^^^^^^^^^^^^
(APIServer pid=91) File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=91) return runner.run(main)
(APIServer pid=91) ^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=91) return self._loop.run_until_complete(task)
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=91) return await main
(APIServer pid=91) ^^^^^^^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1850, in run_server
(APIServer pid=91) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1870, in run_server_worker
(APIServer pid=91) async with build_async_engine_client(
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=91) return await anext(self.gen)
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client
(APIServer pid=91) async with build_async_engine_client_from_engine_args(
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=91) return await anext(self.gen)
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 220, in build_async_engine_client_from_engine_args
(APIServer pid=91) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 1557, in inner
(APIServer pid=91) return fn(*args, **kwargs)
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 174, in from_vllm_config
(APIServer pid=91) return cls(
(APIServer pid=91) ^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 120, in __init__
(APIServer pid=91) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=91) return AsyncMPClient(*client_args)
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 767, in __init__
(APIServer pid=91) super().__init__(
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 446, in __init__
(APIServer pid=91) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=91) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91) File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=91) next(self.gen)
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 706, in launch_core_engines
(APIServer pid=91) wait_for_engine_startup(
(APIServer pid=91) File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 759, in wait_for_engine_startup
(APIServer pid=91) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=91) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
🐛 Describe the bug
I try run
nohup python -m vllm.entrypoints.openai.api_server \
--model =Qwen/Qwen2.5-14B-Instruct-1M\
--served-model-name model \
--api-key "${LLM_API_KEY}" \
--max-model-len 1010000 \
--tensor-parallel-size 1 \
--uvicorn-log-level error \
--gpu-memory-utilization 0.80 \
--max-num-seqs 32 \
--enable-chunked-prefill \
--max-num-batched-tokens 131072 \
--dtype half \
--enforce-eager \
--port 8001 \
> vllm.log 2>&1 &
dependencies from pyproject.toml
"vllm==0.10.1.1",
"transformers>=4.55.0",
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working