Commit c5e10cb

committed

[TRTLLM-4501] fit: AutoTuner tuning config refactor and add tuning for kernel configs.

The motivation for this PR is #4872, in which AutoTuner is applied to FP8 batched GEMM op with tile_size and epilog_tile_m to be in the argument list. * Encoding different configs into a list of numeric tactic IDs starting from 0. This will be implemented inside kernels and used through get_valid_tactics. * Define each config separately and let AutoTuner iterate over the combinations. This is more readable and flexible. Users can use each part of the config directly. There is no encoding-decoding process. Add a config entry in the tuning config to define the valid candidates for each part of the config. * AutoTuner will loop over a search grid generated from the config combinations. * Each config will be tuned along with the specific input profile. * The best config will be recorded in the cache value (instead of the cache key). And it will be recovered and used in the tunable runner forward. Other enhancement: * Use the decorator to make the tuning config definition more natural and efficient. This is an independent enhancement. * Allow the user to not speficy the gen_tuning_buckets or the map_to_tuning_buckets function. * Code refactoring. Signed-off-by: Yukun He <[email protected]>

1 parent 134b238 commit c5e10cbCopy full SHA for c5e10cb

5 files changed

+269

-137

lines changed

tensorrt_llm/_torch
- autotuner.py
- custom_ops
  - torch_custom_ops.py
  - trtllm_gen_custom_ops.py
- utils.py
tests/unittest/_torch
- test_autotuner.py

5 files changed

+269

-137

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit c5e10cb

5 files changed

5 files changed

File tree

5 files changed

5 files changed

0 commit comments