Skip to content

Conversation

@hyukn
Copy link
Collaborator

@hyukn hyukn commented Jun 16, 2025

The motivation for this PR is #4872, in which AutoTuner is applied to FP8 batched GEMM op with tile_size and epilog_tile_m to be in the argument list. Generally, there are two possible implementations.

  • Encoding different configs into a list of numeric tactic IDs starting from 0. This will be implemented inside kernels and used through get_valid_tactics.
  • Define each config separately and let AutoTuner iterate over the combinations. This is more readable and flexible. Users can use each part of the config directly. There is no encoding-decoding process.

We choose the second method: Adding a config entry in the tuning config to define the valid candidates for each part of the config.

  • AutoTuner will loop over a search grid generated from the config combinations.
  • Each config will be tuned along with the specific input profile.
  • The best config will be recorded in the cache value (instead of the cache key). And it will be recovered and used in the tunable runner forward.

Other enhancements:

  • Use the decorator to make the tuning config definition more natural and efficient. This is an independent enhancement.
  • Allow the user to not specify the gen_tuning_buckets or the map_to_tuning_buckets function.
  • Code refactoring.

@hyukn hyukn requested review from DomBrown and litaotju June 16, 2025 07:48
@hyukn hyukn changed the title [TRTLLM-4501] DO NOT MERGE: AutoTuner tuning config refactor and add tuning for kernel configs. [TRTLLM-4501] fit: AutoTuner tuning config refactor and add tuning for kernel configs. Jun 16, 2025
@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from bb2e5bf to 0f5a3e3 Compare June 19, 2025 03:36
@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch 2 times, most recently from a982cfa to 54cd4fa Compare July 4, 2025 07:16
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10964 [ run ] triggered by Bot

@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch 2 times, most recently from 349dc14 to c5e10cb Compare July 4, 2025 08:49
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10979 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10964 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10979 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8109 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from c5e10cb to 1e03e95 Compare July 7, 2025 03:16
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11098 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11098 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8205 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from 1e03e95 to a6d1443 Compare July 7, 2025 06:22
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11109 [ run ] triggered by Bot

@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from a6d1443 to e8b3f23 Compare July 7, 2025 06:30
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11111 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11109 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11111 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8217 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Jul 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11168 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11168 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8261 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from e8b3f23 to a661977 Compare July 8, 2025 02:35
@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from 1594f3c to 2bd1b4f Compare July 8, 2025 03:08
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11203 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11203 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8287 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from 2bd1b4f to 7e5acba Compare July 8, 2025 06:49
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 8, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11234 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11234 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8309 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from 7e5acba to e298d0c Compare July 9, 2025 05:07
…r kernel configs.

Adding a config entry in the tuning config to define the valid candidates for each part of the configs.
* AutoTuner will loop over a search grid generated from the config combinations.
* Each config will be tuned along with the specific input profile.
* The best config will be recorded in the cache value (instead of the cache key). And it will be recovered and used in the tunable runner forward.

Other enhancements:
* Use the decorator to make the tuning config definition more natural and efficient. This is an independent enhancement.
* Allow the user to not speficy the gen_tuning_buckets or the map_to_tuning_buckets function.
* Code refactoring.

Signed-off-by: Yukun He <[email protected]>
@hyukn hyukn force-pushed the feat/autotuner_tunable_configs branch from e298d0c to 2626d9b Compare July 9, 2025 08:38
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 9, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11414 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11414 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8442 completed with status: 'FAILURE'

@hyukn hyukn marked this pull request as ready for review July 10, 2025 01:38
@hyukn hyukn requested a review from a team as a code owner July 10, 2025 01:38
@hyukn hyukn requested a review from HuiGao-NV July 10, 2025 01:38
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 10, 2025

/bot run --disable-fail-fast

@hyukn hyukn requested a review from limin2021 July 10, 2025 01:43
@tensorrt-cicd
Copy link
Collaborator

PR_Github #11483 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11483 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8496 completed with status: 'SUCCESS'

@hyukn hyukn changed the title [TRTLLM-4501] fit: AutoTuner tuning config refactor and add tuning for kernel configs. [TRTLLM-4501] feat: AutoTuner tuning config refactor and add tuning for kernel configs. Aug 1, 2025
@hyukn hyukn changed the title [TRTLLM-4501] feat: AutoTuner tuning config refactor and add tuning for kernel configs. [TRTLLM-4501][feat]: AutoTuner tuning config refactor and add tuning for kernel configs. Aug 1, 2025
@hyukn hyukn changed the title [TRTLLM-4501][feat]: AutoTuner tuning config refactor and add tuning for kernel configs. [TRTLLM-4501][feat] AutoTuner tuning config refactor and add tuning for kernel configs. Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants