[Misc] Use torch.compile for GemmaRMSNorm #7642

WoosukKwon · 2024-08-19T01:52:16Z

This PR is a temporary solution to accelerate Gemma models. The PR can be reverted once #7110 is merged.

github-actions · 2024-08-19T01:52:28Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

vllm/model_executor/layers/layernorm.py

youkaichao · 2024-08-20T07:30:58Z

vllm/model_executor/layers/layernorm.py

+        residual: Optional[torch.Tensor] = None,
+    ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
+        """PyTorch-native implementation equivalent to forward()."""
+        return self.forward_static(self.weight.data, self.variance_epsilon, x,


even if this is a static function, I'm not sure this self would cause problem here.

if you want to be safe, I think you can move this function outside of the class definition.

I think this should be ok since it does not touch the states under self. I also checked that re-compilation does not happen after graph capturing, by monitoring the logs with TORCH_LOGS=guards. Also, the ShareGPT throughput benchmark shows 10~15% improvements.

youkaichao

LGTM as a temporary solution.

Signed-off-by: Alvant <[email protected]>

Signed-off-by: LeiWang1999 <[email protected]>

[Misc] Use torch.compile for GemmaRMSNorm

786fbbd

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 19, 2024

WoosukKwon requested a review from youkaichao August 19, 2024 01:56

WoosukKwon mentioned this pull request Aug 19, 2024

Release v0.5.5 #7481

Closed

youkaichao reviewed Aug 20, 2024

View reviewed changes

vllm/model_executor/layers/layernorm.py Outdated Show resolved Hide resolved

youkaichao reviewed Aug 20, 2024

View reviewed changes

WoosukKwon added 2 commits August 21, 2024 17:54

Merge branch 'main' into gemma-rms

cfc68b4

Fix

e817b74

WoosukKwon requested a review from youkaichao August 22, 2024 01:05

youkaichao approved these changes Aug 22, 2024

View reviewed changes

WoosukKwon merged commit b3856be into main Aug 22, 2024

WoosukKwon deleted the gemma-rms branch August 22, 2024 08:14

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Misc] Use torch.compile for GemmaRMSNorm (vllm-project#7642)

222df94

Signed-off-by: Alvant <[email protected]>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[Misc] Use torch.compile for GemmaRMSNorm (vllm-project#7642)

b5cdeb1

Signed-off-by: LeiWang1999 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc] Use torch.compile for GemmaRMSNorm #7642

[Misc] Use torch.compile for GemmaRMSNorm #7642

Uh oh!

WoosukKwon commented Aug 19, 2024

Uh oh!

github-actions bot commented Aug 19, 2024

Uh oh!

Uh oh!

youkaichao Aug 20, 2024

Uh oh!

WoosukKwon Aug 22, 2024

Uh oh!

youkaichao left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Misc] Use torch.compile for GemmaRMSNorm #7642

[Misc] Use torch.compile for GemmaRMSNorm #7642

Uh oh!

Conversation

WoosukKwon commented Aug 19, 2024

Uh oh!

github-actions bot commented Aug 19, 2024

Uh oh!

Uh oh!

youkaichao Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants