Skip to content

Conversation

@aryasaatvik
Copy link

Related Issue

Adds support for Qwen3 model architecture.

Summary

This PR updates the transformers dependency to ensure compatibility with Qwen3 models.

Changes:

  • Updated transformers version constraint in pyproject.toml to >=4.51.3,<=5.0
  • Regenerated poetry.lock with updated dependencies

Why this is needed:

  • Enables users to serve Qwen3 models
  • Keeps dependencies up to date with latest model architectures

Checklist

  • I have read the CONTRIBUTING guidelines.
  • I have added tests to cover my changes.
  • I have updated the documentation (docs folder) accordingly.

Additional Notes

This is a dependency update that maintains backward compatibility.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Updates transformers dependency in libs/infinity_emb/pyproject.toml to support Qwen3 architecture by raising minimum version to 4.51.3.

  • Updated transformers dependency from >=4.47.0,<=5.0 to >=4.51.3,<=5.0 while maintaining backward compatibility
  • Missing test coverage for Qwen3 model support despite being a checklist item
  • No documentation updates in docs folder to reflect new model support
  • Poetry.lock updated but changes not included in the review context

1 file reviewed, no comments
Edit PR Review Bot Settings | Greptile

@aryasaatvik aryasaatvik changed the title Chore: Update transformers dependency to support Qwen3 architecture chore: update transformers to support Qwen3 architecture Jul 10, 2025
@Abhi011999
Copy link

can we please merge this @michaelfeil
qwen3 is now the current SOTA

@zengqingfu1442
Copy link

Only updating transformers can not solve the problem and cannot still deploy Qwen3-reranker-8B. see #611 (comment)

@aryasaatvik
Copy link
Author

Only updating transformers can not solve the problem and cannot still deploy Qwen3-reranker-8B. see #611 (comment)

try using tomaarsen/Qwen3-Reranker-4B-seq-cls

@zengqingfu1442
Copy link

Only updating transformers can not solve the problem and cannot still deploy Qwen3-reranker-8B. see #611 (comment)

try using tomaarsen/Qwen3-Reranker-4B-seq-cls

What is the difference between tomaarsen/Qwen3-Reranker-8B-seq-cls and official Qwen/Qwen3-reranker-8B? thanks.

@Abhi011999
Copy link

The difference is mentioned here: https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3
They also implemented some hacky method to integrate with vllm: vllm-project/vllm#19260

Apparently, the original Qwen/Qwen3-Reranker-8B model's both embedder and reranker support Qwen3ForCausalLM architecture.

I don't think infinity has a way to support the reranker as of now through Qwen3ForSequenceClassification, they already implemented it in vllm. I am thinking if there's a way to support it here too.

@aryasaatvik
Copy link
Author

The difference is mentioned here: huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3 They also implemented some hacky method to integrate with vllm: vllm-project/vllm#19260

Apparently, the original Qwen/Qwen3-Reranker-8B model's both embedder and reranker support Qwen3ForCausalLM architecture.

I don't think infinity has a way to support the reranker as of now through Qwen3ForSequenceClassification, they already implemented it in vllm. I am thinking if there's a way to support it here too.

I was able to get it working on modal https://gist.github.com/aryasaatvik/6f868d5f064b25e41cf1e20018721dca

@zengqingfu1442
Copy link

The difference is mentioned here: https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3 They also implemented some hacky method to integrate with vllm: vllm-project/vllm#19260

Apparently, the original Qwen/Qwen3-Reranker-8B model's both embedder and reranker support Qwen3ForCausalLM architecture.

I don't think infinity has a way to support the reranker as of now through Qwen3ForSequenceClassification, they already implemented it in vllm. I am thinking if there's a way to support it here too.

The vllm 0.9.2 already support to deploy Qwen3-Reranker-8B and Qwen3-Embedding-8B.

@michaelfeil
Copy link
Owner

A update of the transformer libary will break all

  • onnx-optimum
  • bettertransformer dependencies

There is a large undertaking to migrate these libaries. Migrating bert from bettertransformer to e.g. torch + compile will lead to a 50% performance loss in throughput and latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants