Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions tests/spec_decode/e2e/test_eagle_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,10 @@ def test_llama2_eagle_e2e_greedy_correctness(vllm_runner, common_llm_kwargs,
@pytest.mark.parametrize(
"common_llm_kwargs",
[{
# 2 for small prompt, 256//16 for generated.
"num_gpu_blocks_override": 2 + 256 // 16,
"max_model_len": (2 + 256 // 16) * 16,
Comment on lines +374 to +375
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The numeric literals 2, 256, and 16 are used directly in the calculations for num_gpu_blocks_override and max_model_len. While the inline comment provides some context, defining these values as named constants (e.g., PROMPT_BLOCK_COUNT, GENERATED_TOKEN_COUNT, KV_CACHE_BLOCK_SIZE) would enhance readability and make the purpose of these numbers explicit. This practice helps prevent errors if the values need to be changed in the future, as it centralizes their definition. Consider defining these constants at a higher scope within the test file.

Comment on lines +373 to +375
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The configuration block for common_llm_kwargs, including the num_gpu_blocks_override and max_model_len settings, is duplicated in both test_llama2_eagle_e2e_greedy_correctness and test_llama3_eagle_e2e_greedy_correctness. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider defining this common configuration once (e.g., as a shared dictionary or a common fixture) and reusing it across the relevant pytest.mark.parametrize decorators.


# Skip cuda graph recording for fast test.
"enforce_eager": True,

Expand Down Expand Up @@ -420,6 +424,10 @@ def test_llama3_eagle_e2e_greedy_correctness(vllm_runner, common_llm_kwargs,
@pytest.mark.parametrize(
"common_llm_kwargs",
[{
# 2 for small prompt, 256//16 for generated.
"num_gpu_blocks_override": 2 + 256 // 16,
"max_model_len": (2 + 256 // 16) * 16,

# Skip cuda graph recording for fast test.
"enforce_eager": True,

Expand Down