Skip to content

Conversation

@NielsRogge
Copy link
Contributor

What does this PR do?

For some models, it may be that we have several files with the same name, e.g. for the new InstructBLIP model (#23460), the processor consists of 2 tokenizers (because the model internally uses 2 different text models). Both of these tokenizers require files with the same name, like tokenizer_config.json.

Hence, it would be nice to create subfolders in the model repos to store for instance all files of one particular tokenizer (similar to how the Diffusers library does this). For InstructBLIP, I created a separate qformer_tokenizer folder for this as can be seen here. I had to adapt the save_pretrained and from_pretrained methods of InstructBlipProcessor to save the files to a separate "qformer_tokenizer" folder, and read them back in. I guess those are very specific to InstructBLIP given that the name of the folder is pretty custom.

However, push_to_hub currently doesn't support uploading folders with files. This PR adds this functionality.

@NielsRogge NielsRogge requested a review from sgugger May 31, 2023 18:57
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 31, 2023

The documentation is not available anymore as the PR was closed or merged.

@sgugger sgugger merged commit 6affd9c into huggingface:main May 31, 2023
gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
This was referenced Jun 6, 2023
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants