Skip to content

[Bug]: 'FusedMoE' object has no attribute 'kv_cache' when running a 1P1D test with PowerMoE-3b #21359

@Tangjef

Description

@Tangjef

Your current environment

The output of python collect_env.py
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CMake version                : version 4.0.3
Libc version                 : glibc-2.35

PyTorch version              : 2.7.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8

Python version               : 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.15.0-78-generic-x86_64-with-glibc2.35

Is CUDA available            : True
CUDA runtime version         : 12.8.93
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : 
GPU 0: NVIDIA H100 80GB HBM3


🐛 Describe the bug

FusedMoE' object has no attribute 'kv_cache' when running a 1P1D test with PowerMoE-3b

Follow the example /vllm/examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_example_p2p_nccl_xpyd.sh

  1. start the proxy

  2. start prefill Node

  3. start decode Node

  4. Execute the command

curl http://127.0.0.1:10001/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/PowerMoE-3b",
        "prompt": "San Francisco is a",
        "max_tokens": 8,
        "temperature": 0
    }'

error:

ERROR 07-22 00:57:43 [core.py:588] EngineCore encountered a fatal error.
ERROR 07-22 00:57:43 [core.py:588] Traceback (most recent call last):
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 579, in run_engine_core
ERROR 07-22 00:57:43 [core.py:588]     engine_core.run_busy_loop()
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 606, in run_busy_loop
ERROR 07-22 00:57:43 [core.py:588]     self._process_engine_step()
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 631, in _process_engine_step
ERROR 07-22 00:57:43 [core.py:588]     outputs, model_executed = self.step_fn()
ERROR 07-22 00:57:43 [core.py:588]                               ^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 235, in step
ERROR 07-22 00:57:43 [core.py:588]     model_output = self.execute_model(scheduler_output)
ERROR 07-22 00:57:43 [core.py:588]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 221, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     raise err
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 212, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     return self.model_executor.execute_model(scheduler_output)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 87, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     output = self.collective_rpc("execute_model",
ERROR 07-22 00:57:43 [core.py:588]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 07-22 00:57:43 [core.py:588]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 07-22 00:57:43 [core.py:588]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
ERROR 07-22 00:57:43 [core.py:588]     return func(*args, **kwargs)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-22 00:57:43 [core.py:588]     return func(*args, **kwargs)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 308, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     output = self.model_runner.execute_model(scheduler_output,
ERROR 07-22 00:57:43 [core.py:588]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-22 00:57:43 [core.py:588]     return func(*args, **kwargs)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1372, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     self.maybe_setup_kv_connector(scheduler_output)
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1719, in maybe_setup_kv_connector
ERROR 07-22 00:57:43 [core.py:588]     kv_connector.start_load_kv(get_forward_context())
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py", line 197, in start_load_kv
ERROR 07-22 00:57:43 [core.py:588]     kv_cache_layer = attn_layer.kv_cache[ \
ERROR 07-22 00:57:43 [core.py:588]                      ^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
ERROR 07-22 00:57:43 [core.py:588]     raise AttributeError(
ERROR 07-22 00:57:43 [core.py:588] AttributeError: 'FusedMoE' object has no attribute 'kv_cache'
Process EngineCore_0:
ERROR 07-22 00:57:43 [async_llm.py:419] AsyncLLM output_handler failed.
ERROR 07-22 00:57:43 [async_llm.py:419] Traceback (most recent call last):
ERROR 07-22 00:57:43 [async_llm.py:419]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 378, in output_handler
ERROR 07-22 00:57:43 [async_llm.py:419]     outputs = await engine_core.get_output_async()
ERROR 07-22 00:57:43 [async_llm.py:419]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [async_llm.py:419]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 740, in get_output_async
ERROR 07-22 00:57:43 [async_llm.py:419]     raise self._format_exception(outputs) from None
ERROR 07-22 00:57:43 [async_llm.py:419] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO:     - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 590, in run_engine_core
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 579, in run_engine_core
    engine_core.run_busy_loop()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 606, in run_busy_loop
    self._process_engine_step()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 631, in _process_engine_step
    outputs, model_executed = self.step_fn()
                              ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 235, in step
    model_output = self.execute_model(scheduler_output)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 221, in execute_model
    raise err
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 212, in execute_model
    return self.model_executor.execute_model(scheduler_output)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 87, in execute_model
    output = self.collective_rpc("execute_model",
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 308, in execute_model
    output = self.model_runner.execute_model(scheduler_output,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1372, in execute_model
    self.maybe_setup_kv_connector(scheduler_output)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1719, in maybe_setup_kv_connector
    kv_connector.start_load_kv(get_forward_context())
  File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py", line 197, in start_load_kv
    kv_cache_layer = attn_layer.kv_cache[ \
                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
AttributeError: 'FusedMoE' object has no attribute 'kv_cache'

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions