Skip to content

Conversation

@veezbo
Copy link
Contributor

@veezbo veezbo commented Aug 29, 2023

What does this PR do?

The documentation for efficient single-GPU training previously mentioned that the adamw_bnb_8bit optimizer could only be integrated using a third-party implementation. However, this is now available in Trainer directly as a result of this issue and corresponding PR.

I think it's valuable to keep the 8-bit Adam entry in the documentation as it's a significant improvement over Adafactor. And I also think it's valuable to keep the sample integration with a third-party implementation of an optimizer for reference purposes. I have adjusted the documentation accordingly.

I was able to validate myself that both approaches, using Trainer directly with the optim flag and doing the third-party integration still appear to work when fine-tuning small LLMs on a single GPU.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@stevhliu and @MKhalusova

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 29, 2023

cc @younesbelkada for BNB related stuff 🙏

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks, I left one question!
cc @SunMarc as well

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I left some suggestions to make it more concise!

@veezbo
Copy link
Contributor Author

veezbo commented Aug 30, 2023

Thanks @stevhliu for the suggestions!

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

@amyeroberts amyeroberts merged commit 99fc3ac into huggingface:main Aug 31, 2023
@veezbo veezbo deleted the vibhorkumar.8bitadam_documentation_update branch September 4, 2023 18:17
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
…ptimizer (huggingface#25807)

* Modify single-GPU efficient training doc with now-available adamw_bnb_8bit optimizer

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: Steven Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants