[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.0 #2649

dependabot · 2025-10-03T21:03:07Z

Bumps transformers from 4.55.2 to 4.57.0.

Release notes

v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3

New model additions

Qwen3 Next

The Qwen3-Next series represents the Qwen team's next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency. The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost:

Hybrid Attention: Replaces standard attention with the combination of Gated DeltaNet and Gated Attention, enabling efficient context modeling.

High-Sparsity MoE: Achieves an extreme low activation ratio as 1:50 in MoE layers — drastically reducing FLOPs per token while preserving model capacity.

Multi-Token Prediction(MTP): Boosts pretraining model performance, and accelerates inference.

Other Optimizations: Includes techniques such as zero-centered and weight-decayed layernorm, Gated Attention, and other stabilizing enhancements for robust training.

Built on this architecture, they trained and open-sourced Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving extreme sparsity and efficiency.

Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring less than 1/10 of the training cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32K tokens.

For more details, please visit their blog Qwen3-Next (blog post).

Adding Support for Qwen3-Next by @bozheng-hit in #40771

Vault Gemma

VaultGemma is a text-only decoder model derived from Gemma 2, notably it drops the norms after the Attention and MLP blocks, and uses full attention for all layers instead of alternating between full attention and local sliding attention. VaultGemma is available as a pretrained model with 1B parameters that uses a 1024 token sequence length.

VaultGemma was trained from scratch with sequence-level differential privacy (DP). Its training data includes the same mixture as the Gemma 2 models, consisting of a number of documents of varying lengths. Additionally, it is trained using DP stochastic gradient descent (DP-SGD) and provides a (ε ≤ 2.0, δ ≤ 1.1e-10)-sequence-level DP guarantee, where a sequence consists of 1024 consecutive tokens extracted from heterogeneous data sources. Specifically, the privacy unit of the guarantee is for the sequences after sampling and packing of the mixture.

add: differential privacy research model by @RyanMullins in #40851

Qwen3 VL

Qwen3-VL is a multimodal vision-language model series, encompassing both dense and MoE variants, as well as Instruct and Thinking versions.

Building upon its predecessors, Qwen3-VL delivers significant improvements in visual understanding while maintaining strong pure text capabilities. Key architectural advancements include: enhanced MRope with interleaved layout for better spatial-temporal modeling, DeepStack integration to effectively leverage multi-level features from the Vision Transformer (ViT), and improved video understanding through text-based time alignment—evolving from T-RoPE to text timestamp alignment for more precise temporal grounding.

These innovations collectively enable Qwen3-VL to achieve superior performance in complex multimodal tasks.

Adding Support for Qwen3-VL Series by @JJJYmmm in #40795

Longcat Flash

The LongCatFlash model was proposed in LongCat-Flash Technical Report by the Meituan LongCat Team. LongCat-Flash is a 560B parameter Mixture-of-Experts (MoE) model that activates 18.6B-31.3B parameters dynamically (average ~27B). The model features a shortcut-connected architecture enabling high inference speed (>100 tokens/second) and advanced reasoning capabilities.

... (truncated)

Commits

8ac2b91 Release: v4.57.0
2ccc6ca v4.57.0 Branch (#41310)
438343d Don't list dropout in eager_paged_attention_forward (#40924)
449da6b Add FlexOlmo model (#40921)
3bb1b48 Standardize audio embedding function name for audio multimodal models (#40919)
58e13b9 Update expected values for some test_speculative_generation (#40949)
529d3a2 Fix Glm4vModelTest::test_eager_matches_fa2_generate (#40947)
a2ac4de Remove nested import logic for torchvision (#40940)
8e837f6 Consistent naming for images kwargs (#40834)
eb04363 Raise error instead of warning when using meta device in from_pretrained (#40...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [transformers](https://github.com/huggingface/transformers) from 4.55.2 to 4.57.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v4.55.2...v4.57.0) --- updated-dependencies: - dependency-name: transformers dependency-version: 4.57.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Oct 3, 2025

dependabot bot had a problem deploying to docker-s3-upload October 3, 2025 21:03 Failure

dependabot bot mentioned this pull request Oct 3, 2025

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.56.2 #2647

Closed

meta-cla bot added the cla signed label Oct 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.0 #2649

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.0 #2649

dependabot bot commented on behalf of github Oct 3, 2025

Uh oh!

Uh oh!

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.0 #2649

Are you sure you want to change the base?

[Dependabot] Update(deps): Bump transformers from 4.55.2 to 4.57.0 #2649

Conversation

dependabot bot commented on behalf of github Oct 3, 2025

v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3

New model additions

Qwen3 Next

Vault Gemma

Qwen3 VL

Longcat Flash

Uh oh!

Uh oh!