-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation #20213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -370,6 +370,10 @@ def test_llama2_eagle_e2e_greedy_correctness(vllm_runner, common_llm_kwargs, | |
| @pytest.mark.parametrize( | ||
| "common_llm_kwargs", | ||
| [{ | ||
| # 2 for small prompt, 256//16 for generated. | ||
| "num_gpu_blocks_override": 2 + 256 // 16, | ||
| "max_model_len": (2 + 256 // 16) * 16, | ||
|
Comment on lines
+373
to
+375
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The configuration block for |
||
|
|
||
| # Skip cuda graph recording for fast test. | ||
| "enforce_eager": True, | ||
|
|
||
|
|
@@ -420,6 +424,10 @@ def test_llama3_eagle_e2e_greedy_correctness(vllm_runner, common_llm_kwargs, | |
| @pytest.mark.parametrize( | ||
| "common_llm_kwargs", | ||
| [{ | ||
| # 2 for small prompt, 256//16 for generated. | ||
| "num_gpu_blocks_override": 2 + 256 // 16, | ||
| "max_model_len": (2 + 256 // 16) * 16, | ||
|
|
||
| # Skip cuda graph recording for fast test. | ||
| "enforce_eager": True, | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numeric literals
2,256, and16are used directly in the calculations fornum_gpu_blocks_overrideandmax_model_len. While the inline comment provides some context, defining these values as named constants (e.g.,PROMPT_BLOCK_COUNT,GENERATED_TOKEN_COUNT,KV_CACHE_BLOCK_SIZE) would enhance readability and make the purpose of these numbers explicit. This practice helps prevent errors if the values need to be changed in the future, as it centralizes their definition. Consider defining these constants at a higher scope within the test file.