-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
Currently I can't load the model google/gemma-3-4b-it, it seems to be an issue with model config. It was working until commit 9354114 but the changes in the next commit e0836f9 seems to have caused some issues.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Reproducibility:
run the command:
trtllm-serve serve google/gemma-3-4b-it --host 0.0.0.0 --port 9010 --backend pytorch --max_seq_len 8192 --extra_llm_api_options config.yml
config.yml has the configuration:
cuda_graph_config: null
attn_backend: "FLASHINFER"
kv_cache_config:
enable_block_reuse: false
Expected behavior
The model should load and endpoint should respond
actual behavior
Error:
[07/18/2025-20:27:59] [TRT-LLM] [E] Failed to initialize executor on rank 0: Cannot determine model architecture from config
[07/18/2025-20:27:59] [TRT-LLM] [E] Traceback (most recent call last):
File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 738, in worker_main
worker: GenerationExecutorWorker = worker_cls(
^^^^^^^^^^^
File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 139, in __init__
self.engine = _create_engine()
^^^^^^^^^^^^^^^^
File "/code/tensorrt_llm/tensorrt_llm/executor/worker.py", line 137, in _create_engine
return create_executor(**args)
^^^^^^^^^^^^^^^^^^^^^^^
File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 229, in create_py_executor
model_engine = PyTorchModelEngine(
^^^^^^^^^^^^^^^^^^^
File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 288, in __init__
self.model = self._load_model(
^^^^^^^^^^^^^^^^^
File "/code/tensorrt_llm/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 999, in _load_model
weight_mapper = checkpoint_loader.get_initilized_weight_mapper(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/tensorrt_llm/tensorrt_llm/_torch/models/checkpoints/base_checkpoint_loader.py", line 81, in get_initilized_weight_mapper
raise ValueError(
ValueError: Cannot determine model architecture from config
ValueError: Cannot determine model architecture from config
The above exception was the direct cause of the following exception:
additional notes
The tests were done on a single gpu RTX4090, I don't think my env has anything special, it seems it is a model weight mapping issue
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working