Skip to content

Conversation

@ltBai
Copy link

@ltBai ltBai commented Mar 5, 2025

What does this PR do?

We are trying to add a new model named long-vita to the transformers repository, it is a mllm that be able to deal with long-context to 1 million tokens, looking forward to your feedback!

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@github-actions github-actions bot marked this pull request as draft March 5, 2025 07:33
@Rocketknight1
Copy link
Member

Rocketknight1 commented Mar 6, 2025

Hi @ltBai,

In general, we recommend that most models are uploaded as custom code using the steps here, without needing a PR to the transformers library. This will let you share the model immediately, and it'll work exactly the same as a library model (except that users will need to set trust_remote_code=True)

In general we only accept PRs to add new architectures to the core transformers library when one of these are true:

  • There's a pretrained model with a lot of interested users
  • There's a paper on the architecture with SOTA results or lots of interest/citations
  • The model comes from a company or research group whose past models have gotten a lot of usage (because this means the new model will probably get a lot of users too)

The reason for this is that once a model is actually in transformers itself, then the team at Hugging Face takes full responsibility for maintaining the code, testing it and making sure it stays compatible with new versions of transformers. We can't do this for every model architecture!

Remember that just because a model is a custom code model, doesn't make it less important. A lot of extremely popular and high-performance models are custom code models and don't have Transformers PRs, for example Phi-4-multimodal is the top trending model on Hugging Face today, and it is also a custom code model without a library PR! Starting with a custom code model is definitely the right approach for most authors!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants