-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Adding Idefics multi modal model. #842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-Authored-By: Victor Sanh <[email protected]>
Next: - precompute cos/sin - Fix KV layout - Refactor everything for flash (long).
Fused KV (QKV cannot be done for cross attention because shapes are different) in attention Fused Gate + up. ~at 40ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yaaaay! thanks for the all the improvements!
server/text_generation_server/models/custom_modeling/idefics_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/idefics_modeling.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I have suggested a few minor nits in the transformers PR (here) as a result of this review, which may optionally be ported here as well :)
Co-Authored-By: Victor Sanh [email protected]
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.