-
-
Couldn't load subscription status.
- Fork 10.8k
Enable headless models for pooling in the Transformers backend #21767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable headless models for pooling in the Transformers backend #21767
Conversation
Signed-off-by: Harry Mellor <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables loading headless models for embedding with the Transformers backend, which is a great addition. The changes in the configuration and registry look correct, and the new test case covers the intended scenarios.
However, I've found a critical issue in the WeightsMapper implementation for the new TransformersModel. The current logic for prefixing weights is flawed due to incorrect key ordering in the dictionary, which will cause weight loading to fail for one of the model formats this PR aims to support. I've provided a detailed comment and a code suggestion to fix this.
Signed-off-by: Harry Mellor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for extending this support!
Signed-off-by: Harry Mellor <[email protected]>
|
The mapper is having issues, I'll disable auto-merge for now |
Signed-off-by: Harry Mellor <[email protected]>
…project#21767) Signed-off-by: Harry Mellor <[email protected]>
…project#21767) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…project#21767) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Noam Gat <[email protected]>
…project#21767) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…project#21767) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…project#21767) Signed-off-by: Harry Mellor <[email protected]>
…project#21767) Signed-off-by: Harry Mellor <[email protected]>
Previously, embedding model checkpoints that had their layers at the root of the checkpoint would not load correctly with the Transformers backend.
This PR enables the loading of Transformers base model classes.
Now, both of the following formats of checkpoint will work for pooling tasks:
ModelForCausalLM:Model: