You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/deployment-guide/quick-start-recipe-for-llama4-scout-on-trtllm.md
-5Lines changed: 0 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,14 +65,9 @@ cuda_graph_config:
65
65
max_batch_size: 1024
66
66
kv_cache_config:
67
67
dtype: fp8
68
-
use_torch_sampler: true
69
68
EOF
70
69
```
71
70
72
-
> Here `use_torch_sampler: true` is added as a temporary WAR to solve illegal memory access issue when using trtllm native sampler.
73
-
>
74
-
> TODO: Remove this after the issue is resolved
75
-
76
71
### Launch the TRT-LLM Server
77
72
78
73
Below is an example command to launch the TRT-LLM server with the Llama-4-Scout-17B-16E-Instruct-FP8 model from within the container. The command is specifically configured for the 1024/1024 Input/Output Sequence Length test. The explanation of each flag is shown in the “Configs and Parameters” section.
Copy file name to clipboardExpand all lines: tensorrt_llm/llmapi/llm_args.py
+12-5Lines changed: 12 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -1968,6 +1968,13 @@ class LoadFormat(Enum):
1968
1968
DUMMY=1
1969
1969
1970
1970
1971
+
classSamplerType(StrEnum):
1972
+
"""Enum for sampler type options."""
1973
+
TRTLLMSampler="TRTLLMSampler"
1974
+
TorchSampler="TorchSampler"
1975
+
auto="auto"
1976
+
1977
+
1971
1978
classTorchCompileConfig(StrictBaseModel):
1972
1979
"""
1973
1980
Configuration for torch.compile.
@@ -2055,11 +2062,11 @@ class TorchLlmArgs(BaseLlmArgs):
2055
2062
"If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc.",
2056
2063
status="beta")
2057
2064
2058
-
use_torch_sampler: bool=Field(
2059
-
default=False,
2065
+
sampler_type: Union[str, SamplerType]=Field(
2066
+
default=SamplerType.auto,
2060
2067
description=
2061
-
"If true, will use the Torch sampler instead of the TRTLLM sampler.",
2062
-
status="beta")
2068
+
"The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested.",
0 commit comments