doc: remove the outdated features which marked as Experimental

nv-guomingz · nv-guomingz · commit d69b27e2bbf2 · 2025-07-14T08:47:53.000Z
Signed-off-by: nv-guomingz &lt;137257613+nv-guomingz@users.noreply.github.com&gt;
diff --git a/docs/source/advanced/gpt-attention.md b/docs/source/advanced/gpt-attention.md
@@ -112,8 +112,6 @@ printed.
 #### XQA Optimization
 
 Another optimization for MQA/GQA in generation phase called XQA optimization.
-It is still experimental feature and support limited configurations. LLAMA2 70B
-is one model that it supports.
 
 Support matrix of the XQA optimization:
  - FP16 / BF16 compute data type.
diff --git a/docs/source/advanced/speculative-decoding.md b/docs/source/advanced/speculative-decoding.md
@@ -167,7 +167,7 @@ TensorRT-LLM implements the ReDrafter model such that logits prediction, beam se
 
 The EAGLE approach enhances the single-model Medusa method by predicting and verifying tokens using the same model. Similarly to ReDrafter, it predicts draft tokens using a recurrent predictor where each draft token depends on the previous one. However, unlike ReDrafter, it uses a single-layer transformer model to predict draft tokens from previous hidden states and decoded tokens. In the EAGLE-1 decoding tree needs to be known during the decoding. In the EAGLE-2 this tree is asssembled during the execution by searching for the most probable hypothesis along the beam.
 
-Similarly to ReDrafter, TensorRT-LLM implements the EAGLE model such that logits prediction, draft tokens acceptance and draft token generation are performed inside of the TensorRT engine. EAGLE-1 and EAGLE-2 are both supported, while EAGLE-2 is currently in the experimental stage. Please, visit the [EAGLE README](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/eagle/README.md) for information about building and running the model.
+Similarly to ReDrafter, TensorRT-LLM implements the EAGLE model such that logits prediction, draft tokens acceptance and draft token generation are performed inside of the TensorRT engine(EAGLE-1 and EAGLE-2 are both supported). Please, visit the [EAGLE README](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/eagle/README.md) for information about building and running the model.
 
 ## Lookahead Decoding
 
diff --git a/docs/source/performance/perf-benchmarking.md b/docs/source/performance/perf-benchmarking.md
@@ -236,15 +236,6 @@ The following command builds an FP8 quantized engine by specifying the engine tu
 trtllm-bench --model meta-llama/Llama-3.1-8B build --quantization FP8 --max_seq_len 4096 --max_batch_size 1024 --max_num_tokens 2048
 ```
 
-- [Experimental] Build engine with target ISL/OSL for optimization:
-In this experimental mode, you can provide hints to `trtllm-bench`'s tuning heuristic to optimize the engine on specific ISL and OSL targets.
-Generally, the target ISL and OSL aligns with the average ISL and OSL of the dataset, but you can experiment with different values to optimize the engine using this mode.
-The following command builds an FP8 quantized engine and optimizes for ISL:OSL targets of 128:128.
-
-```shell
-trtllm-bench --model meta-llama/Llama-3.1-8B build --quantization FP8 --max_seq_len 4096 --target_isl 128 --target_osl 128
-```
-
 
 #### Parallelism Mapping Support
 The `trtllm-bench build` subcommand supports combinations of tensor-parallel (TP) and pipeline-parallel (PP) mappings as long as the world size (`tp_size x pp_size`) `<=` `8`. The parallelism mapping in build subcommad is controlled by `--tp_size` and `--pp_size` options. The following command builds an engine with TP2-PP2 mapping.
diff --git a/docs/source/torch.md b/docs/source/torch.md
@@ -1,11 +1,7 @@
 # PyTorch Backend
 
-```{note}
-Note:
-This feature is currently experimental, and the related API is subjected to change in future versions.
-```
 
-To enhance the usability of the system and improve developer efficiency, TensorRT-LLM launches a new experimental backend based on PyTorch.
+To enhance the usability of the system and improve developer efficiency, TensorRT-LLM launches a new backend based on PyTorch.
 
 The PyTorch backend of TensorRT-LLM is available in version 0.17 and later. You can try it via importing `tensorrt_llm._torch`.