[LOAD MODEL] Can't load gemma3 models

### System Info

Currently I can't load the model google/gemma-3-4b-it, it seems to be an issue with model config. It was working until commit https://github.com/NVIDIA/TensorRT-LLM/commit/9354114f6827eb6654fb515175cfbcec73acc99b but the changes in the next commit https://github.com/NVIDIA/TensorRT-LLM/commit/e0836f9ca90507d1cd2a1c70e63396f0689529ed seems to have caused some issues.


### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Reproducibility:
run the command:
```trtllm-serve serve google/gemma-3-4b-it --host 0.0.0.0 --port 9010 --backend pytorch --max_seq_len 8192 --extra_llm_api_options config.yml```
config.yml has the configuration:
```
cuda_graph_config: null
attn_backend: "FLASHINFER"
kv_cache_config:
  enable_block_reuse: false
```

### Expected behavior

The model should load and endpoint should respond

### actual behavior

Error:
```
[07/18/2025-20:27:59] [TRT-LLM] [E] Failed to initialize executor on rank 0: Cannot determine model architecture from config
[07/18/2025-20:27:59] [TRT-LLM] [E] Traceback (most recent call last):
  File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 738, in worker_main
    worker: GenerationExecutorWorker = worker_cls(
                                       ^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 139, in __init__
    self.engine = _create_engine()
                  ^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 137, in _create_engine
    return create_executor(**args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 229, in create_py_executor
    model_engine = PyTorchModelEngine(
                   ^^^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 288, in __init__
    self.model = self._load_model(
                 ^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 999, in _load_model
    weight_mapper = checkpoint_loader.get_initilized_weight_mapper(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/tensorrt_llm/tensorrt_llm/_torch/models/checkpoints/base_checkpoint_loader.py", line 81, in get_initilized_weight_mapper
    raise ValueError(
ValueError: Cannot determine model architecture from config

ValueError: Cannot determine model architecture from config

The above exception was the direct cause of the following exception:
```

### additional notes

The tests were done on a single gpu RTX4090, I don't think my env has anything special, it seems it is  a model weight mapping issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LOAD MODEL] Can't load gemma3 models #6193

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[LOAD MODEL] Can't load gemma3 models #6193

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions