Skip to content

Commit d69b27e

Browse files
committed
doc: remove the outdated features which marked as Experimental
Signed-off-by: nv-guomingz <[email protected]>
1 parent c7ffadf commit d69b27e

File tree

4 files changed

+2
-17
lines changed

4 files changed

+2
-17
lines changed

docs/source/advanced/gpt-attention.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,6 @@ printed.
112112
#### XQA Optimization
113113

114114
Another optimization for MQA/GQA in generation phase called XQA optimization.
115-
It is still experimental feature and support limited configurations. LLAMA2 70B
116-
is one model that it supports.
117115

118116
Support matrix of the XQA optimization:
119117
- FP16 / BF16 compute data type.

docs/source/advanced/speculative-decoding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ TensorRT-LLM implements the ReDrafter model such that logits prediction, beam se
167167

168168
The EAGLE approach enhances the single-model Medusa method by predicting and verifying tokens using the same model. Similarly to ReDrafter, it predicts draft tokens using a recurrent predictor where each draft token depends on the previous one. However, unlike ReDrafter, it uses a single-layer transformer model to predict draft tokens from previous hidden states and decoded tokens. In the EAGLE-1 decoding tree needs to be known during the decoding. In the EAGLE-2 this tree is asssembled during the execution by searching for the most probable hypothesis along the beam.
169169

170-
Similarly to ReDrafter, TensorRT-LLM implements the EAGLE model such that logits prediction, draft tokens acceptance and draft token generation are performed inside of the TensorRT engine. EAGLE-1 and EAGLE-2 are both supported, while EAGLE-2 is currently in the experimental stage. Please, visit the [EAGLE README](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/eagle/README.md) for information about building and running the model.
170+
Similarly to ReDrafter, TensorRT-LLM implements the EAGLE model such that logits prediction, draft tokens acceptance and draft token generation are performed inside of the TensorRT engine(EAGLE-1 and EAGLE-2 are both supported). Please, visit the [EAGLE README](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/eagle/README.md) for information about building and running the model.
171171

172172
## Lookahead Decoding
173173

docs/source/performance/perf-benchmarking.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -236,15 +236,6 @@ The following command builds an FP8 quantized engine by specifying the engine tu
236236
trtllm-bench --model meta-llama/Llama-3.1-8B build --quantization FP8 --max_seq_len 4096 --max_batch_size 1024 --max_num_tokens 2048
237237
```
238238

239-
- [Experimental] Build engine with target ISL/OSL for optimization:
240-
In this experimental mode, you can provide hints to `trtllm-bench`'s tuning heuristic to optimize the engine on specific ISL and OSL targets.
241-
Generally, the target ISL and OSL aligns with the average ISL and OSL of the dataset, but you can experiment with different values to optimize the engine using this mode.
242-
The following command builds an FP8 quantized engine and optimizes for ISL:OSL targets of 128:128.
243-
244-
```shell
245-
trtllm-bench --model meta-llama/Llama-3.1-8B build --quantization FP8 --max_seq_len 4096 --target_isl 128 --target_osl 128
246-
```
247-
248239

249240
#### Parallelism Mapping Support
250241
The `trtllm-bench build` subcommand supports combinations of tensor-parallel (TP) and pipeline-parallel (PP) mappings as long as the world size (`tp_size x pp_size`) `<=` `8`. The parallelism mapping in build subcommad is controlled by `--tp_size` and `--pp_size` options. The following command builds an engine with TP2-PP2 mapping.

docs/source/torch.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,7 @@
11
# PyTorch Backend
22

3-
```{note}
4-
Note:
5-
This feature is currently experimental, and the related API is subjected to change in future versions.
6-
```
73

8-
To enhance the usability of the system and improve developer efficiency, TensorRT-LLM launches a new experimental backend based on PyTorch.
4+
To enhance the usability of the system and improve developer efficiency, TensorRT-LLM launches a new backend based on PyTorch.
95

106
The PyTorch backend of TensorRT-LLM is available in version 0.17 and later. You can try it via importing `tensorrt_llm._torch`.
117

0 commit comments

Comments
 (0)