You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TRTLLM-4501] fit: AutoTuner tuning config refactor and add tuning for kernel configs.
The motivation for this PR is #4872, in which AutoTuner is applied to FP8 batched GEMM op with tile_size and epilog_tile_m to be in the argument list.
* Encoding different configs into a list of numeric tactic IDs starting from 0. This will be implemented inside kernels and used through get_valid_tactics.
* Define each config separately and let AutoTuner iterate over the combinations. This is more readable and flexible. Users can use each part of the config directly. There is no encoding-decoding process.
Add a config entry in the tuning config to define the valid candidates for each part of the config.
* AutoTuner will loop over a search grid generated from the config combinations.
* Each config will be tuned along with the specific input profile.
* The best config will be recorded in the cache value (instead of the cache key). And it will be recovered and used in the tunable runner forward.
Other enhancement:
* Use the decorator to make the tuning config definition more natural and efficient. This is an independent enhancement.
* Allow the user to not speficy the gen_tuning_buckets or the map_to_tuning_buckets function.
* Code refactoring.
Signed-off-by: Yukun He <[email protected]>
0 commit comments