Skip to content

Conversation

@tdoublep
Copy link

What does this PR do?

This fixes an issue for models like mistralai/Mamba-Codestral-7B-v0.1 that use mamba2 architecture with n_groups>1. It is equivalent to the fix for Zamba from #35943.

This issue prevents us comparing against transformers as baseline in vLLM CI for this model.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@pglorio @vasqu @ArthurZucker @hmellor

Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, falcon_h1, granitemoehybrid, mamba2

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also let the zamba2 model inherit this from mamba2 then? Let's not introduce alternatives here

And could you add a fast test to mamba2 that might catch this?

Comment on lines +54 to +55
def __init__(self, hidden_size, group_size, eps=1e-6):
super().__init__(hidden_size, group_size, eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, hidden_size, group_size, eps=1e-6):
super().__init__(hidden_size, group_size, eps)
pass

Weird that this even used the init here, we shouldn't need to anything on the modular side

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other modular codes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants