[Easy][Model Registry] Add Llama4ForCausalLM in model registry #19580

liuzijing2014 · 2025-06-12T21:42:51Z

Purpose

Allow vLLM to run text-only Llama4 model, aka Llama4ForCausalLM.

Test Plan

Run vLLM w/ a text-only Llama4 Maverick checkpoint (vender internal one).

Test Result

Model successfully recognized and loaded

Loading safetensors checkpoint shards:   0% Completed | 0/84 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  10% Completed | 8/84 [00:00<00:00, 77.14it/s]
Loading safetensors checkpoint shards:  19% Completed | 16/84 [00:00<00:01, 48.57it/s]
Loading safetensors checkpoint shards:  29% Completed | 24/84 [00:00<00:01, 53.53it/s]
Loading safetensors checkpoint shards:  36% Completed | 30/84 [00:00<00:00, 55.29it/s]
Loading safetensors checkpoint shards:  43% Completed | 36/84 [00:00<00:00, 49.49it/s]
Loading safetensors checkpoint shards:  50% Completed | 42/84 [00:00<00:01, 39.83it/s]
Loading safetensors checkpoint shards:  67% Completed | 56/84 [00:01<00:00, 56.72it/s]
Loading safetensors checkpoint shards:  77% Completed | 65/84 [00:01<00:00, 57.88it/s]
Loading safetensors checkpoint shards:  88% Completed | 74/84 [00:01<00:00, 64.70it/s]
Loading safetensors checkpoint shards:  96% Completed | 81/84 [00:01<00:00, 56.54it/s]
Loading safetensors checkpoint shards: 100% Completed | 84/84 [00:01<00:00, 56.55it/s]
(VllmWorker rank=0 pid=604590) 
(VllmWorker rank=7 pid=604598) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.72 seconds
(VllmWorker rank=0 pid=604590) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.64 seconds
(VllmWorker rank=2 pid=604593) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.65 seconds
(VllmWorker rank=6 pid=604597) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.62 seconds
(VllmWorker rank=4 pid=604595) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.60 seconds
(VllmWorker rank=5 pid=604596) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.70 seconds
(VllmWorker rank=1 pid=604591) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.71 seconds
(VllmWorker rank=3 pid=604594) INFO 06-12 14:37:50 [default_loader.py:272] Loading weights took 45.71 seconds
(VllmWorker rank=3 pid=604594) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 46.056898 seconds
(VllmWorker rank=0 pid=604590) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 45.984370 seconds
(VllmWorker rank=4 pid=604595) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 45.931201 seconds
(VllmWorker rank=5 pid=604596) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 46.051216 seconds
(VllmWorker rank=1 pid=604591) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 46.056561 seconds
(VllmWorker rank=6 pid=604597) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 45.960544 seconds
(VllmWorker rank=2 pid=604593) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 46.010001 seconds
(VllmWorker rank=7 pid=604598) INFO 06-12 14:37:51 [gpu_model_runner.py:1615] Model loading took 48.8683 GiB and 46.070654 seconds

Evaluation results on task gsm8k.8_shot.1_gen: em: 0.957500 | f1: 0.957500 | em_maj1@1: 0.957500 | f1_maj1@1: 0.957500

Signed-off-by: Zijing Liu <[email protected]>

gemini-code-assist · 2025-06-12T21:42:55Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

github-actions · 2025-06-12T21:43:00Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

yeqcharlotte · 2025-06-12T21:48:02Z

we deleted this from the registry because it used to broke a bunch of CI that @ywang96 has to hack to let it through. in particular could you check the workarounds in #16113 and run precommits locally to make sure they are good?

liuzijing2014 · 2025-06-12T21:53:09Z

we deleted this from the registry because it used to broke a bunch of CI that @ywang96 has to hack to let it through. in particular could you check the workarounds in #16113 and run precommits locally to make sure they are good?

pre-commit runs fine locally:

yapf.....................................................................Passed
ruff.....................................................................Passed
ruff-format..........................................(no files to check)Skipped
typos....................................................................Passed
isort....................................................................Passed
clang-format.........................................(no files to check)Skipped
PyMarkdown...........................................(no files to check)Skipped
Lint GitHub Actions workflow files...................(no files to check)Skipped
pip-compile..........................................(no files to check)Skipped
Run mypy for local Python installation...................................Passed
Lint shell scripts...................................(no files to check)Skipped
Lint PNG exports from excalidraw.....................(no files to check)Skipped
Check SPDX headers.......................................................Passed
Check for spaces in all filenames........................................Passed
Update Dockerfile dependency graph.......................................Passed
Enforce import regex as re...............................................Passed
Forbid direct 'import triton'............................................Passed
Prevent new pickle/cloudpickle imports...................................Passed
Suggestion...............................................................Passed
- hook id: suggestion
- duration: 0s

To bypass pre-commit hooks, add --no-verify to git commit.

Sign-off Commit..........................................................Passed

I will wait and see if there is any CI failure signals.

houseroad

Put it on hold, and just check the CI.

ywang96 · 2025-06-13T05:44:51Z

vllm/model_executor/models/registry.py

    "LlamaForCausalLM": ("llama", "LlamaForCausalLM"),
    # For decapoda-research/llama-*
    "LLaMAForCausalLM": ("llama", "LlamaForCausalLM"),
+    "Llama4ForCausalLM": ("llama4", "Llama4ForCausalLM"),


Currently our basic-models-test always assumes that the tested architectures have a corresponding huggingface model repository to test with.

vllm/tests/models/registry.py

Lines 440 to 447 in ace5cda

_EXAMPLE_MODELS = {

**_TEXT_GENERATION_EXAMPLE_MODELS,

**_EMBEDDING_EXAMPLE_MODELS,

**_CROSS_ENCODER_EXAMPLE_MODELS,

**_MULTIMODAL_EXAMPLE_MODELS,

**_SPECULATIVE_DECODING_EXAMPLE_MODELS,

**_TRANSFORMERS_MODELS,

}

Do you think it's possible to add a dummy model repo on HF with the architecture Llama4ForCausalLM? Alternatively you will need to modify test_registry.py for CI to pass.

ywang96 · 2025-06-13T05:46:52Z

vllm/model_executor/models/registry.py

    "LlamaForCausalLM": ("llama", "LlamaForCausalLM"),
    # For decapoda-research/llama-*
    "LLaMAForCausalLM": ("llama", "LlamaForCausalLM"),
+    "Llama4ForCausalLM": ("llama4", "Llama4ForCausalLM"),


On a related note, I think the proper way to support the text-only usage of models that are released as "natively multimodal" like llama4 or mistral-small 3.1 is to add a --language-model-only mode

Maybe we should just go with "--language-model-only" solution? @liuzijing2014 thoughts?

I see, I will try out this idea for Llama4.

@liuzijing2014 Happy to collaborate on this! This was one of the items that I'm planning to work on too :)

register Llama4ForCausalLM

844c2c2

Signed-off-by: Zijing Liu <[email protected]>

liuzijing2014 requested a review from houseroad June 12, 2025 21:43

liuzijing2014 added the v1 label Jun 12, 2025

houseroad requested a review from ywang96 June 12, 2025 23:34

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2025

houseroad requested changes Jun 12, 2025

View reviewed changes

ywang96 reviewed Jun 13, 2025

View reviewed changes

liuzijing2014 closed this Jun 13, 2025

ywang96 mentioned this pull request Jul 14, 2025

Allow serving Llama4ForCausalLM directly #20868

Closed

4 tasks

liPatrick mentioned this pull request Jul 23, 2025

[Model] Ultravox: Support Llama 4 and Gemma 3 backends #17818

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Easy][Model Registry] Add Llama4ForCausalLM in model registry #19580

[Easy][Model Registry] Add Llama4ForCausalLM in model registry #19580

Uh oh!

liuzijing2014 commented Jun 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

yeqcharlotte commented Jun 12, 2025

Uh oh!

liuzijing2014 commented Jun 12, 2025

Uh oh!

houseroad left a comment

Uh oh!

ywang96 Jun 13, 2025

Uh oh!

ywang96 Jun 13, 2025 •

edited

Loading

Uh oh!

houseroad Jun 13, 2025

Uh oh!

liuzijing2014 Jun 13, 2025

Uh oh!

ywang96 Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	_EXAMPLE_MODELS = {
	**_TEXT_GENERATION_EXAMPLE_MODELS,
	**_EMBEDDING_EXAMPLE_MODELS,
	**_CROSS_ENCODER_EXAMPLE_MODELS,
	**_MULTIMODAL_EXAMPLE_MODELS,
	**_SPECULATIVE_DECODING_EXAMPLE_MODELS,
	**_TRANSFORMERS_MODELS,
	}

Uh oh!

[Easy][Model Registry] Add Llama4ForCausalLM in model registry #19580

[Easy][Model Registry] Add Llama4ForCausalLM in model registry #19580

Uh oh!

Conversation

liuzijing2014 commented Jun 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

yeqcharlotte commented Jun 12, 2025

Uh oh!

liuzijing2014 commented Jun 12, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

ywang96 Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

houseroad Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

liuzijing2014 Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liuzijing2014 commented Jun 12, 2025 •

edited by github-actions bot

Loading

ywang96 Jun 13, 2025 •

edited

Loading