Skip to content

Commit 9cc4e5d

Browse files
yizhang-nvdc3671
authored andcommitted
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation (#5463)
Signed-off-by: Yi Zhang <[email protected]> Signed-off-by: yizhan <[email protected]>
1 parent e5e87ec commit 9cc4e5d

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

cpp/tensorrt_llm/thop/fp4BlockScaleMoe.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ std::vector<torch::Tensor> run_fp4_block_scale_moe_runner(torch::Tensor const& r
8686

8787
TORCH_CHECK(num_experts % 4 == 0, "Routing kernel expects that num_experts must be divisible by 4");
8888
TORCH_CHECK(num_experts > top_k, "num_experts must be greater than top_k");
89+
TORCH_CHECK(num_experts <= 256, "num_experts must be less than or equal to 256");
8990

9091
tensorrt_llm::kernels::trtllmGenFp8BlockScaleMoe::MoE::MoERunnerArgs args;
9192
tensorrt_llm::kernels::trtllmGenFp8BlockScaleMoe::MoE::MoEWorkspace workspace;

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1088,8 +1088,7 @@ def test_nvfp4_4gpus(self, fp8kv, attention_dp, cuda_graph,
10881088
pytest.skip("https://nvbugs/5252313")
10891089
if torch_compile and pp_size > 1:
10901090
pytest.skip("PP with torch.compile is not supported yet.")
1091-
if not attention_dp and (tp_size > 1 or ep_size > 1):
1092-
pytest.skip("https://nvbugs/5336321")
1091+
10931092
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.9)
10941093
# Picewise Cuda Graph cannot be enabled for nvfp4 attention dp.
10951094
torch_compile_config = TorchCompileConfig(

0 commit comments

Comments
 (0)