Skip to content

Conversation

hyukn
Copy link
Collaborator

@hyukn hyukn commented Jun 12, 2025

Revise some implementations in the tunable FP8 batched GEMM op to reduce Python overhead for the original #4872.
The class TuningConfig is hashable so that the same contents will be converted to the same hash value. This will not cause any cache key miss. But it can introduce extra overheads for the inference phase.

@hyukn hyukn requested a review from DomBrown June 12, 2025 01:30
@hyukn hyukn requested a review from a team as a code owner June 12, 2025 01:30
@hyukn hyukn requested review from pcastonguay and liji-nv June 12, 2025 01:30
@hyukn hyukn requested a review from litaotju June 12, 2025 01:42
@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch 2 times, most recently from 26bb347 to 2d0ac26 Compare June 12, 2025 02:26
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 12, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8578 [ run ] triggered by Bot

@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch 2 times, most recently from 2c17b89 to f52d7d0 Compare June 12, 2025 04:16
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 12, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8600 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8600 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6241 completed with status: 'ABORTED'

@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch 2 times, most recently from 2769352 to fc0c839 Compare June 13, 2025 02:44
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 13, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8724 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8724 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6328 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from fc0c839 to d0e3b33 Compare June 16, 2025 01:31
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 16, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8950 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8950 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6530 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from d0e3b33 to 0894c7e Compare June 16, 2025 07:51
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 16, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8995 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8995 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6567 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Jun 17, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9071 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9071 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6633 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Jun 17, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9083 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9083 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6645 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from e3258ad to 5e301d1 Compare June 17, 2025 03:38
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 17, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9116 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9116 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6670 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from 5e301d1 to c7510d0 Compare June 17, 2025 09:29
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 17, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9183 [ run ] triggered by Bot

@hyukn
Copy link
Collaborator Author

hyukn commented Jun 17, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9197 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9183 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #6728 completed with status: 'FAILURE'

@DomBrown
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9233 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9233 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6771 completed with status: 'FAILURE'

@DomBrown
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9242 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9242 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6779 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from c7510d0 to 7f869dd Compare June 18, 2025 01:15
@hyukn
Copy link
Collaborator Author

hyukn commented Jun 18, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9263 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9263 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6796 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@hyukn hyukn enabled auto-merge (squash) June 18, 2025 06:33
@hyukn hyukn merged commit 6711ad9 into NVIDIA:main Jun 18, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants