vLLM cannot run modelopt quantized weights

## Describe the bug

vLLM cannot run modelopt quantized weights. After following the examples of FP8 quantization in [examples/llm_ptq](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq), it succeeded with generating FP8 weights, but when I try to run it with vLLM it has an errors.

### Steps/Code to reproduce bug

```
export HF_PATH=https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
scripts/huggingface_example.sh --model $HF_PATH --quant fp8 --export_fmt=hf
```

```
from vllm import LLM

llm_fp8 = LLM(model="<the exported model path>", quantization="modelopt")
print(llm_fp8.generate(["What's the age of the earth? "]))
```

### Expected behavior

Do not fail/crash. Generate a response.


## System information

- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 24.04.1 LTS
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): NVIDIA H100 80GB HBM3
- GPU memory size: 79.6 GB
- Number of GPUs: 4
- Library versions (if applicable):
  - Python: 3.12.3
  - ModelOpt version or commit hash: 0.31.0
  - CUDA: 12.8
  - PyTorch: 2.7.0a0+7c8ec84dab.nv25.03
  - Transformers: 4.51.0
[TensorRT-LLM] TensorRT-LLM version: 0.19.0
  - TensorRT-LLM: 0.19.0
  - ONNXRuntime: 1.22.0
  - TensorRT: 10.9.0.34
- Any other details that may help: ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

vLLM cannot run modelopt quantized weights #228

Describe the bug

Steps/Code to reproduce bug

Expected behavior

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

vLLM cannot run modelopt quantized weights #228

Description

Describe the bug

Steps/Code to reproduce bug

Expected behavior

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions