[Bug]: 'FusedMoE' object has no attribute 'kv_cache'  when running a 1P1D test with PowerMoE-3b

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CMake version                : version 4.0.3
Libc version                 : glibc-2.35

PyTorch version              : 2.7.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8

Python version               : 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.15.0-78-generic-x86_64-with-glibc2.35

Is CUDA available            : True
CUDA runtime version         : 12.8.93
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : 
GPU 0: NVIDIA H100 80GB HBM3


```


</details>


### 🐛 Describe the bug

FusedMoE' object has no attribute 'kv_cache'  when running a 1P1D test with PowerMoE-3b

Follow the example `/vllm/examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_example_p2p_nccl_xpyd.sh`

1. start the proxy

2. start prefill Node

3. start decode Node

4. Execute the command
```bash
curl http://127.0.0.1:10001/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/PowerMoE-3b",
        "prompt": "San Francisco is a",
        "max_tokens": 8,
        "temperature": 0
    }'
```

error:
```
ERROR 07-22 00:57:43 [core.py:588] EngineCore encountered a fatal error.
ERROR 07-22 00:57:43 [core.py:588] Traceback (most recent call last):
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 579, in run_engine_core
ERROR 07-22 00:57:43 [core.py:588]     engine_core.run_busy_loop()
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 606, in run_busy_loop
ERROR 07-22 00:57:43 [core.py:588]     self._process_engine_step()
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 631, in _process_engine_step
ERROR 07-22 00:57:43 [core.py:588]     outputs, model_executed = self.step_fn()
ERROR 07-22 00:57:43 [core.py:588]                               ^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 235, in step
ERROR 07-22 00:57:43 [core.py:588]     model_output = self.execute_model(scheduler_output)
ERROR 07-22 00:57:43 [core.py:588]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 221, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     raise err
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 212, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     return self.model_executor.execute_model(scheduler_output)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 87, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     output = self.collective_rpc("execute_model",
ERROR 07-22 00:57:43 [core.py:588]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 07-22 00:57:43 [core.py:588]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 07-22 00:57:43 [core.py:588]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
ERROR 07-22 00:57:43 [core.py:588]     return func(*args, **kwargs)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-22 00:57:43 [core.py:588]     return func(*args, **kwargs)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 308, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     output = self.model_runner.execute_model(scheduler_output,
ERROR 07-22 00:57:43 [core.py:588]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-22 00:57:43 [core.py:588]     return func(*args, **kwargs)
ERROR 07-22 00:57:43 [core.py:588]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1372, in execute_model
ERROR 07-22 00:57:43 [core.py:588]     self.maybe_setup_kv_connector(scheduler_output)
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1719, in maybe_setup_kv_connector
ERROR 07-22 00:57:43 [core.py:588]     kv_connector.start_load_kv(get_forward_context())
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py", line 197, in start_load_kv
ERROR 07-22 00:57:43 [core.py:588]     kv_cache_layer = attn_layer.kv_cache[ \
ERROR 07-22 00:57:43 [core.py:588]                      ^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [core.py:588]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
ERROR 07-22 00:57:43 [core.py:588]     raise AttributeError(
ERROR 07-22 00:57:43 [core.py:588] AttributeError: 'FusedMoE' object has no attribute 'kv_cache'
Process EngineCore_0:
ERROR 07-22 00:57:43 [async_llm.py:419] AsyncLLM output_handler failed.
ERROR 07-22 00:57:43 [async_llm.py:419] Traceback (most recent call last):
ERROR 07-22 00:57:43 [async_llm.py:419]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 378, in output_handler
ERROR 07-22 00:57:43 [async_llm.py:419]     outputs = await engine_core.get_output_async()
ERROR 07-22 00:57:43 [async_llm.py:419]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-22 00:57:43 [async_llm.py:419]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 740, in get_output_async
ERROR 07-22 00:57:43 [async_llm.py:419]     raise self._format_exception(outputs) from None
ERROR 07-22 00:57:43 [async_llm.py:419] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO:     - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 590, in run_engine_core
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 579, in run_engine_core
    engine_core.run_busy_loop()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 606, in run_busy_loop
    self._process_engine_step()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 631, in _process_engine_step
    outputs, model_executed = self.step_fn()
                              ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 235, in step
    model_output = self.execute_model(scheduler_output)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 221, in execute_model
    raise err
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 212, in execute_model
    return self.model_executor.execute_model(scheduler_output)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 87, in execute_model
    output = self.collective_rpc("execute_model",
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 308, in execute_model
    output = self.model_runner.execute_model(scheduler_output,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1372, in execute_model
    self.maybe_setup_kv_connector(scheduler_output)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1719, in maybe_setup_kv_connector
    kv_connector.start_load_kv(get_forward_context())
  File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py", line 197, in start_load_kv
    kv_cache_layer = attn_layer.kv_cache[ \
                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
AttributeError: 'FusedMoE' object has no attribute 'kv_cache'
```





### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: 'FusedMoE' object has no attribute 'kv_cache' when running a 1P1D test with PowerMoE-3b #21359

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: 'FusedMoE' object has no attribute 'kv_cache' when running a 1P1D test with PowerMoE-3b #21359

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions