Skip to content

[LOAD MODEL] Can't load gemma3 models #6193

@johncalesp

Description

@johncalesp

System Info

Currently I can't load the model google/gemma-3-4b-it, it seems to be an issue with model config. It was working until commit 9354114 but the changes in the next commit e0836f9 seems to have caused some issues.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Reproducibility:
run the command:
trtllm-serve serve google/gemma-3-4b-it --host 0.0.0.0 --port 9010 --backend pytorch --max_seq_len 8192 --extra_llm_api_options config.yml
config.yml has the configuration:

cuda_graph_config: null
attn_backend: "FLASHINFER"
kv_cache_config:
  enable_block_reuse: false

Expected behavior

The model should load and endpoint should respond

actual behavior

Error:

[07/18/2025-20:27:59] [TRT-LLM] [E] Failed to initialize executor on rank 0: Cannot determine model architecture from config
[07/18/2025-20:27:59] [TRT-LLM] [E] Traceback (most recent call last):
  File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 738, in worker_main
    worker: GenerationExecutorWorker = worker_cls(
                                       ^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 139, in __init__
    self.engine = _create_engine()
                  ^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 137, in _create_engine
    return create_executor(**args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 229, in create_py_executor
    model_engine = PyTorchModelEngine(
                   ^^^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 288, in __init__
    self.model = self._load_model(
                 ^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 999, in _load_model
    weight_mapper = checkpoint_loader.get_initilized_weight_mapper(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/models/checkpoints/base_checkpoint_loader.py", line 81, in get_initilized_weight_mapper
    raise ValueError(
ValueError: Cannot determine model architecture from config

ValueError: Cannot determine model architecture from config

The above exception was the direct cause of the following exception:

additional notes

The tests were done on a single gpu RTX4090, I don't think my env has anything special, it seems it is a model weight mapping issue

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions