[core] gemma2 full context length support #10584

youkaichao · 2024-11-22T23:57:14Z

the scheduler treats it as a model without sliding window, and sliding window is only used for computation.

FIX #6220
FIX #8580

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-11-22T23:57:27Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-11-23T00:02:40Z

fixes #9517

WoosukKwon

Just a heads up: the paged attention kernel we use for the xFormers backend doesn't support sliding window attention. This PR will introduce a slight correctness bug in the xformers backend.

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-11-23T01:36:20Z

Just a heads up: the paged attention kernel we use for the xFormers backend doesn't support sliding window attention. This PR will introduce a slight correctness bug in the xformers backend.

for xformers, we keep the original behavior of capping the max-model-length.

noamgat · 2024-11-23T20:05:28Z

This looks great! Which attention backend do you recommend for gemma 2 now?

youkaichao · 2024-11-24T00:30:21Z

@noamgat the default one (flash attention) should work.

azsh1725 · 2024-12-09T17:31:16Z

Hi! Thanks for this fix.

Can you, please, tell me when the release with this fix is planned?

Signed-off-by: youkaichao <[email protected]>

azsh1725 · 2025-01-07T18:49:52Z

Hi! Thanks for this fix.

Can you, please, tell me when the release with this fix is planned?

For those interested, the release with this fix is version 0.6.5

Signed-off-by: youkaichao <[email protected]>

youkaichao added 5 commits November 22, 2024 15:20

fix alternating sliding window

f0401d5

Signed-off-by: youkaichao <[email protected]>

add tests

7c4700d

Signed-off-by: youkaichao <[email protected]>

add tests

c846ff7

Signed-off-by: youkaichao <[email protected]>

add comments

cab0770

Signed-off-by: youkaichao <[email protected]>

add comments

615020a

Signed-off-by: youkaichao <[email protected]>

youkaichao requested a review from WoosukKwon November 22, 2024 23:59

WoosukKwon approved these changes Nov 23, 2024

View reviewed changes

WoosukKwon reviewed Nov 23, 2024

View reviewed changes

youkaichao added 3 commits November 22, 2024 17:29

restore old behavior for xformers

5f8c223

Signed-off-by: youkaichao <[email protected]>

skip tests

553069a

Signed-off-by: youkaichao <[email protected]>

fix xformers

2fd08d4

Signed-off-by: youkaichao <[email protected]>

youkaichao enabled auto-merge (squash) November 23, 2024 01:35

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 23, 2024

youkaichao disabled auto-merge November 23, 2024 04:13

youkaichao merged commit 4aba6e3 into vllm-project:main Nov 23, 2024
63 of 68 checks passed

youkaichao deleted the fix_gemma2 branch November 23, 2024 04:13

patrickvonplaten mentioned this pull request Nov 23, 2024

Interleaving sliding window for Ministral-8B-Instruct-2410 #10591

Merged

yxchng mentioned this pull request Nov 28, 2024

[Installation]: vLLM build from source errors #8532

Closed

1 task

youkaichao mentioned this pull request Dec 12, 2024

[Usage]: Can we extend the context length of gemma2 model or other models? #10548

Closed

1 task

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[core] gemma2 full context length support (vllm-project#10584)

d32ce32

Signed-off-by: youkaichao <[email protected]>

simon-mo mentioned this pull request Jan 25, 2025

[Model] Enable Inference Support for the New Baichuan-M1 Model #12251

Closed

anko-intel pushed a commit to HabanaAI/vllm-fork that referenced this pull request Feb 12, 2025

[core] gemma2 full context length support (vllm-project#10584)

2da7606

Signed-off-by: youkaichao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[core] gemma2 full context length support #10584

[core] gemma2 full context length support #10584

Uh oh!

youkaichao commented Nov 22, 2024 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 22, 2024

Uh oh!

youkaichao commented Nov 23, 2024

Uh oh!

WoosukKwon left a comment

Uh oh!

youkaichao commented Nov 23, 2024

Uh oh!

Uh oh!

noamgat commented Nov 23, 2024

Uh oh!

youkaichao commented Nov 24, 2024

Uh oh!

azsh1725 commented Dec 9, 2024

Uh oh!

azsh1725 commented Jan 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[core] gemma2 full context length support #10584

[core] gemma2 full context length support #10584

Uh oh!

Conversation

youkaichao commented Nov 22, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 22, 2024

Uh oh!

youkaichao commented Nov 23, 2024

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Nov 23, 2024

Uh oh!

Uh oh!

noamgat commented Nov 23, 2024

Uh oh!

youkaichao commented Nov 24, 2024

Uh oh!

azsh1725 commented Dec 9, 2024

Uh oh!

azsh1725 commented Jan 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

youkaichao commented Nov 22, 2024 •

edited by github-actions bot

Loading