Rename eagle cache dir #19027

zou3519 · 2025-06-02T15:10:27Z

The way the eagle head caching works in vLLM today:

there is a base model. This one gets a hash (which is used as the cache dir)
the eagle head has its own model. This model is pre-determined by the hash of the base model. The eagle head needs its own cache dir. This PR updates the name of the hash dir to be {base_model}-{eagle_method} for readability reasons.

Test Plan:

python vllm/examples/offline_inference/eagle.py and checked the cache directory name.

github-actions · 2025-06-02T15:10:38Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

The way the eagle head caching works in vLLM today is: - there is a base model. This one gets a hash (which is used as the cache dir) - the eagle head has its own model. This model is pre-determined by the hash of the base model. The eagle head needs its own cache dir. This PR updates the name of the hash dir to be `{base_model}-{eagle_method}` for readability reasons. Test Plan: - `python vllm/examples/offline_inference/eagle.py` and checked the cache directory name. Signed-off-by: rzou <[email protected]>

houseroad · 2025-06-02T16:50:35Z

vllm/compilation/backends.py

+        # calls in a single model, please open an issue and let's discuss.
+        speculative_config = self.vllm_config.speculative_config
+        if (speculative_config is not None and speculative_config.use_eagle()):
+            if compilation_counter.num_graphs_seen == 1:


If we have multiple layers or graph break, how do we handle this?

This PR improves on the previous state. It doesn't change anything about it.

multiple layers

The support_torch_compile gets applied on a model with multiple layers. Example:

vllm/vllm/model_executor/models/gemma3.py

Lines 345 to 346 in ca2f6b9

@support_torch_compile

class Gemma3Model(nn.Module):

. All of these layers go through the same torch.compile

graph break

My understanding is that there are no graph breaks in vLLM. fullgraph is set to True by default:

vllm/vllm/compilation/wrapper.py

Line 46 in ca2f6b9

fullgraph=envs.VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE,

youkaichao · 2025-06-03T07:52:17Z

I added #19064 to address this problem. please take a look.

the problem with this PR, is that it cannot generalize to vision encoders in the future. I expect we might have the following compilation in the end:

main language model
vision encoder
eagle/medusa head

zou3519 · 2025-06-03T12:37:00Z

not needed anymore

zou3519 mentioned this pull request Jun 2, 2025

[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE #17211

Merged

zou3519 force-pushed the eagle_cache_name branch from 6ef4073 to 2d6612a Compare June 2, 2025 15:20

zou3519 requested review from houseroad and youkaichao June 2, 2025 16:09

zou3519 marked this pull request as ready for review June 2, 2025 16:09

zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 2, 2025

houseroad reviewed Jun 2, 2025

View reviewed changes

zou3519 closed this Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rename eagle cache dir #19027

Rename eagle cache dir #19027

Uh oh!

zou3519 commented Jun 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

houseroad Jun 2, 2025

Uh oh!

zou3519 Jun 2, 2025 •

edited

Loading

Uh oh!

youkaichao commented Jun 3, 2025

Uh oh!

zou3519 commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Rename eagle cache dir #19027

Rename eagle cache dir #19027

Uh oh!

Conversation

zou3519 commented Jun 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

houseroad Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Jun 3, 2025

Uh oh!

zou3519 commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zou3519 commented Jun 2, 2025 •

edited by github-actions bot

Loading

zou3519 Jun 2, 2025 •

edited

Loading