[Bug]: TypeError: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx' in Qwen/Qwen2.5-14B-Instruct-1M

### Your current environment

```
INFO 09-01 13:49:38 [__init__.py:241] Automatically detected platform cuda.
(APIServer pid=91) INFO 09-01 13:49:38 [api_server.py:1805] vLLM API server version 0.10.1.1
(APIServer pid=91) INFO 09-01 13:49:38 [utils.py:326] non-default args: {'port': 8001, 'uvicorn_log_level': 'error', 'api_key': ['key_123'], 'model': 'Qwen/Qwen2.5-14B-Instruct-1M', 'dtype': 'half', 'max_model_len': 1010000, 'enforce_eager': True, 'served_model_name': ['model'], 'gpu_memory_utilization': 0.8, 'max_num_batched_tokens': 131072, 'max_num_seqs': 32, 'enable_chunked_prefill': True}
(APIServer pid=91) INFO 09-01 13:49:43 [__init__.py:711] Resolved architecture: Qwen2ForCausalLM
(APIServer pid=91) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=91) WARNING 09-01 13:49:43 [__init__.py:2819] Casting torch.bfloat16 to torch.float16.
(APIServer pid=91) INFO 09-01 13:49:43 [__init__.py:1750] Using max model len 1010000
(APIServer pid=91) INFO 09-01 13:49:43 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=131072.
(APIServer pid=91) INFO 09-01 13:49:45 [weight_utils.py:254] Loaded sparse attention config from /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-14B-Instruct-1M/snapshots/620fad32de7bdd2293b3d99b39eba2fe63e97438/sparse_attention_config.json
(APIServer pid=91) INFO 09-01 13:49:45 [__init__.py:3565] Cudagraph is disabled under eager mode
INFO 09-01 13:49:48 [__init__.py:241] Automatically detected platform cuda.
(EngineCore_0 pid=1425) INFO 09-01 13:49:49 [core.py:636] Waiting for init message from front-end.
(EngineCore_0 pid=1425) INFO 09-01 13:49:49 [core.py:74] Initializing a V1 LLM engine (v0.10.1.1) with config: model='Qwen/Qwen2.5-14B-Instruct-1M', speculative_config=None, tokenizer='Qwen/Qwen2.5-14B-Instruct-1M', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1010000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=model, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":0,"local_cache_dir":null}
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [parallel_state.py:1134] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [topk_topp_sampler.py:50] Using FlashInfer for top-p & top-k sampling.
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [gpu_model_runner.py:1953] Starting to load model Qwen/Qwen2.5-14B-Instruct-1M...
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [gpu_model_runner.py:1985] Loading model from scratch...
(EngineCore_0 pid=1425) INFO 09-01 13:49:50 [cuda.py:328] Using Flash Attention backend on V1 engine.
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] EngineCore failed to start.
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] Traceback (most recent call last):
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 492, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 80, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self._init_executor()
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.collective_rpc("load_model")
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3007, in run_method
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     return func(*args, **kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 212, in load_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1986, in load_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.model = model_loader.load_model(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 465, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.model = Qwen2Model(vllm_config=vllm_config,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 316, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                                                     ^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 641, in make_layers
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 318, in <lambda>
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     lambda prefix: decoder_layer_type(config=config,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 216, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.self_attn = Qwen2Attention(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                      ^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 162, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.attn = Attention(
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                 ^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]   File "/app/.venv/lib/python3.12/site-packages/vllm/attention/layer.py", line 175, in __init__
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]     self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) ERROR 09-01 13:49:50 [core.py:700] TypeError: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'
(EngineCore_0 pid=1425) Process EngineCore_0:
(EngineCore_0 pid=1425) Traceback (most recent call last):
(EngineCore_0 pid=1425)   File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=1425)     self.run()
(EngineCore_0 pid=1425)   File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=1425)     self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 704, in run_engine_core
(EngineCore_0 pid=1425)     raise e
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
(EngineCore_0 pid=1425)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=1425)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 492, in __init__
(EngineCore_0 pid=1425)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 80, in __init__
(EngineCore_0 pid=1425)     self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=1425)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=1425)     self._init_executor()
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=1425)     self.collective_rpc("load_model")
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=1425)     answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=1425)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3007, in run_method
(EngineCore_0 pid=1425)     return func(*args, **kwargs)
(EngineCore_0 pid=1425)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 212, in load_model
(EngineCore_0 pid=1425)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1986, in load_model
(EngineCore_0 pid=1425)     self.model = model_loader.load_model(
(EngineCore_0 pid=1425)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=1425)     model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=1425)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=1425)     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=1425)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 465, in __init__
(EngineCore_0 pid=1425)     self.model = Qwen2Model(vllm_config=vllm_config,
(EngineCore_0 pid=1425)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=1425)     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 316, in __init__
(EngineCore_0 pid=1425)     self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_0 pid=1425)                                                     ^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 641, in make_layers
(EngineCore_0 pid=1425)     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_0 pid=1425)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 318, in <lambda>
(EngineCore_0 pid=1425)     lambda prefix: decoder_layer_type(config=config,
(EngineCore_0 pid=1425)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 216, in __init__
(EngineCore_0 pid=1425)     self.self_attn = Qwen2Attention(
(EngineCore_0 pid=1425)                      ^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 162, in __init__
(EngineCore_0 pid=1425)     self.attn = Attention(
(EngineCore_0 pid=1425)                 ^^^^^^^^^^
(EngineCore_0 pid=1425)   File "/app/.venv/lib/python3.12/site-packages/vllm/attention/layer.py", line 175, in __init__
(EngineCore_0 pid=1425)     self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
(EngineCore_0 pid=1425)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1425) TypeError: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx'
[rank0]:[W901 13:49:51.134763764 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=91) Traceback (most recent call last):
(APIServer pid=91)   File "<frozen runpy>", line 198, in _run_module_as_main
(APIServer pid=91)   File "<frozen runpy>", line 88, in _run_code
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1920, in <module>
(APIServer pid=91)     uvloop.run(run_server(args))
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=91)     return __asyncio.run(
(APIServer pid=91)            ^^^^^^^^^^^^^^
(APIServer pid=91)   File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=91)     return runner.run(main)
(APIServer pid=91)            ^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=91)     return self._loop.run_until_complete(task)
(APIServer pid=91)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=91)     return await main
(APIServer pid=91)            ^^^^^^^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1850, in run_server
(APIServer pid=91)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1870, in run_server_worker
(APIServer pid=91)     async with build_async_engine_client(
(APIServer pid=91)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=91)     return await anext(self.gen)
(APIServer pid=91)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client
(APIServer pid=91)     async with build_async_engine_client_from_engine_args(
(APIServer pid=91)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=91)     return await anext(self.gen)
(APIServer pid=91)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 220, in build_async_engine_client_from_engine_args
(APIServer pid=91)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=91)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 1557, in inner
(APIServer pid=91)     return fn(*args, **kwargs)
(APIServer pid=91)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 174, in from_vllm_config
(APIServer pid=91)     return cls(
(APIServer pid=91)            ^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 120, in __init__
(APIServer pid=91)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=91)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=91)     return AsyncMPClient(*client_args)
(APIServer pid=91)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 767, in __init__
(APIServer pid=91)     super().__init__(
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 446, in __init__
(APIServer pid=91)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=91)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=91)   File "/root/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=91)     next(self.gen)
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 706, in launch_core_engines
(APIServer pid=91)     wait_for_engine_startup(
(APIServer pid=91)   File "/app/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 759, in wait_for_engine_startup
(APIServer pid=91)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=91) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
```

### 🐛 Describe the bug

I try run 

```
nohup python -m vllm.entrypoints.openai.api_server \
  --model =Qwen/Qwen2.5-14B-Instruct-1M\
  --served-model-name model \
  --api-key "${LLM_API_KEY}" \
  --max-model-len 1010000 \
  --tensor-parallel-size 1 \
  --uvicorn-log-level error \
  --gpu-memory-utilization 0.80 \
  --max-num-seqs 32 \
  --enable-chunked-prefill \
  --max-num-batched-tokens 131072 \
  --dtype half \
  --enforce-eager \
  --port 8001 \
  > vllm.log 2>&1 &
```


dependencies from `pyproject.toml`
```
  "vllm==0.10.1.1",
  "transformers>=4.55.0",
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: TypeError: FlashAttentionImpl.init() got an unexpected keyword argument 'layer_idx' in Qwen/Qwen2.5-14B-Instruct-1M #24048

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: TypeError: FlashAttentionImpl.__init__() got an unexpected keyword argument 'layer_idx' in Qwen/Qwen2.5-14B-Instruct-1M #24048

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: TypeError: FlashAttentionImpl.init() got an unexpected keyword argument 'layer_idx' in Qwen/Qwen2.5-14B-Instruct-1M #24048