-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching #26222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a valuable cleanup of the Mamba2 prefix caching logic. The variable renames, such as from last_state_idx to block_idx_last_computed_token, significantly improve code clarity and maintainability across multiple files, including the Triton kernels. The refactoring of the state-saving loop in mamba_mixer2.py and the index calculation logic in Mamba2AttentionMetadataBuilder are well-executed, making the implementation more straightforward and easier to follow. The changes are consistent and appear correct. Overall, this is a solid improvement to the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
heheda12345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you very much.
…-project#26222) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Karan Goel <[email protected]>
…-project#26222) Signed-off-by: Thomas Parnell <[email protected]>
…-project#26222) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…-project#26222) Signed-off-by: Thomas Parnell <[email protected]>
…-project#26222) Signed-off-by: Thomas Parnell <[email protected]>
…-project#26222) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
This PR is addressing the review comments from @heheda12345 which I didn't get in before merging.
Test Plan
The hybrid tests pass locally for me
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.