- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.7k
[VLM][Model] TP support for ViTs #7186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VLM][Model] TP support for ViTs #7186
Conversation
| 👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these: 
 🚀 | 
f214028    to
    414040f      
    Compare
  
    | [Intermediate status] | 
| With the following simple test codes, I can successfully run all listed models on both   | 
| 
 This is great and thank you so much for the implementation and thorough testing coverage. I will take a look this week and get back to you! | 
| @ywang96 @DarkLight1337 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I've run all the models again with your test file, so let's get this in! Thank you for the work! @ChristopherCho
| Ah... this will actually break the CPU test. How about using transformers  | 
| @ywang96 | 
| @ywang96 I've checked the error message and found some issues here. 
 However, for other VLM models, they're deselected (as  I think we can avoid the error by just importing  | 
| @ChristopherCho I see, let me try to move those import statements to inside the run_test call and see if that helps | 
| #8055 (comment) | 
Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Alvant <[email protected]>
Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>
As a follow-up PR to #6942, I've implemented the TP version of various ViTs. The following models have been changed:
Following the
Idefics2VisionAttention, I've used thememory_efficient_attention_forwardfromxformers.To load the models correctly, the
load_weightspart of the models that use these ViTs should also be updated. Thus, the following models have been changed.