Skip to content

Investigate TRTLLM runtime repetitive issue #5254

@Fridah-nv

Description

@Fridah-nv

For models listed below, trtllm runtime generates repetitive outputs and less coherent than demollm's outputs:

EleutherAI/pythia-6.9b
HuggingFaceTB/SmolVLM2-2.2B-Instruct
allenai/OLMo-2-1124-7B-SFT
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
meta-llama/CodeLlama-7b-Python-hf
meta-llama/Llama-3.2-1B-Instruct
mistralai/Mistral-Large-Instruct-2407
mistralai/Mistral-Nemo-Instruct-2407
nvidia/Llama-3.1-Minitron-4B-Width-Base

Let's take a closer look to see if there's any misconfiguration with trtllm runtime.
This analysis is based on Jun 1st dashboard run. View the coverage google sheet Tab 6/1/2025 for specific outputs.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy BackendInference runtime<NV>General operational aspects of TRTLLM execution not in other categories.bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions