Skip to content

Commit 349dc14

Browse files
committed
[TRTLLM-4501] fit: AutoTuner tuning config refactor and add tuning for kernel configs.
The motivation for this PR is NVIDIA#4872, in which AutoTuner is applied to FP8 batched GEMM op with tile_size and epilog_tile_m to be in the argument list. * Encoding different configs into a list of numeric tactic IDs starting from 0. This will be implemented inside kernels and used through get_valid_tactics. * Define each config separately and let AutoTuner iterate over the combinations. This is more readable and flexible. Users can use each part of the config directly. There is no encoding-decoding process. Add a config entry in the tuning config to define the valid candidates for each part of the config. * AutoTuner will loop over a search grid generated from the config combinations. * Each config will be tuned along with the specific input profile. * The best config will be recorded in the cache value (instead of the cache key). And it will be recovered and used in the tunable runner forward. Other enhancement: * Use the decorator to make the tuning config definition more natural and efficient. This is an independent enhancement. * Allow the user to not speficy the gen_tuning_buckets or the map_to_tuning_buckets function. * Code refactoring. Signed-off-by: Yukun He <[email protected]>
1 parent 134b238 commit 349dc14

File tree

5 files changed

+265
-135
lines changed

5 files changed

+265
-135
lines changed

0 commit comments

Comments
 (0)