-
Notifications
You must be signed in to change notification settings - Fork 30.9k
modify context length for GPTQ + version bump #25899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @SunMarc , thanks a lot! 🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left a few nits on the doc. Could you link / detail what is exllama
and act_order
?
Co-authored-by: Arthur <[email protected]>
It works with the new model path :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! (make sure to rebase on main before merging for the failing tests)
* add new arg for gptq * add tests * add min version autogptq * fix order * skip test * fix * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <[email protected]> * fix style * change model path --------- Co-authored-by: Arthur <[email protected]>
What does this PR do ?
This PR adds the possibility to change the max input length when using exllama backend + act_order. We also bump the required version of gptq to
0.4.2
.The gptq tests passed and I skipped a test because we need to wait for a release on optimum side.