chore(deps): Bump torchao from 0.11 to 0.14.1 #189

dependabot · 2025-10-24T21:12:46Z

Bumps torchao from 0.11 to 0.14.1.

Release notes

v0.14.1

Highlights

We are excited to announce the 0.14.1 release of torchao! This release adds support for MoE training on Backwell GPUs and NVFP4 QAT!

(Prototype) MoE training on Blackwell GPUs

We’ve added a quantized building block for speeding up MoE training on Blackwell GPUs: torchao’s `_scaled_grouped_mm`! It is a differentiable drop-in replacement for `torch._grouped_mm` that dynamically quantizes inputs using the given recipe, performs a scaled grouped GEMM, then returns the results in original precision. This results in significant speedups (see benchmarks below)!
import torch
from torch.nn import functional as F
from torchao.prototype.moe_training import (
    _scaled_grouped_mm as torchao_scaled_grouped_mm
)
from torchao.prototype.moe_training.conversion_utils import MoEScalingType
from torchao.prototype.moe_training.utils import generate_jagged_offs
num_groups, total_M, N, K = 8, 131072, 8192, 5120
A = input actvations, B = expert weights
A = torch.randn(total_M, K, dtype=torch.bfloat16, device="cuda", requires_grad=True)
B = torch.randn(num_groups, N, K, dtype=torch.bfloat16, device="cuda", requires_grad=True)
Token group offsets computed by router in actual MoE layer
offs = generate_jagged_offs(num_groups, total_M, device="cuda")
Forward and backward example
out = torchao_scaled_grouped_mm(
A,
B.transpose(-2, -1),
offs=offs,
scaling_type=MoEScalingType.MXFP8,
)
labels = torch.ones_like(out)
loss = F.mse_loss(out, labels)
loss.backward()
Microbenchmarks (see README for commands to reproduce benchmarks):

Forward + backward pass vs torch._grouped_mm:

~1.4-1.8x faster for Llama4 17bx16e shapes

~1.2-1.4x faster for DeepSeekV3 671b shapes

Full MoE layer forward + backward pass:

~1.4x faster (Llama4 17bx16e shapes, batch_size=8, seq_len=16384)

~1.2x faster (DeepSeekV3 671b shapes, batch_size=8, seq_len=16384).

It’s also already integrated into TorchTitan for E2E training with DeepSeekV3 and Llama4! Just use the command line flag: `--model.converters=”quantize.grouped_mm.mx”, which will convert all `torch._grouped_mm` ops to torchao _scaled_grouped_mm ops under the hood:

... (truncated)

Commits

d94bb19 Update version to 0.14.1 (#3213)
1a578d1 Fix TORCHAO_SKIP_LOADING_SO_FILES behavior (#3189)
937e3a4 Fix SyntaxWarning during installation / import (#3184)
2dedd73 Update compatibility matrix (#3178)
0294124 Fix setuptools version for docs build (#3150)
c40417e Fix rocm CI (#3136)
a35fe4e Update python to 3.10 (#3119)
483ef10 [Inductor][float8] Support qlinear for float8 in inductor (#2565)
1c55f61 Update test-infra references from main to release branch
c96f2dd enable select for NVFP4Tensor (#3117)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [torchao](https://github.com/pytorch/ao) from 0.11 to 0.14.1. - [Release notes](https://github.com/pytorch/ao/releases) - [Commits](pytorch/ao@v0.11.0...v0.14.1) --- updated-dependencies: - dependency-name: torchao dependency-version: 0.14.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Oct 24, 2025

dependabot bot requested review from BrandonGroth, andrea-fasoli, chichun-charlie-liu, kcirred, nwang-ibm and tharapalanivel as code owners October 24, 2025 21:12

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Oct 24, 2025

dependabot bot mentioned this pull request Oct 24, 2025

chore(deps): Bump torchao from 0.11 to 0.14.0 #188

Closed

github-actions bot added chore and removed dependencies Pull requests that update a dependency file labels Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(deps): Bump torchao from 0.11 to 0.14.1 #189

chore(deps): Bump torchao from 0.11 to 0.14.1 #189

Uh oh!

dependabot bot commented on behalf of github Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chore(deps): Bump torchao from 0.11 to 0.14.1 #189

Are you sure you want to change the base?

chore(deps): Bump torchao from 0.11 to 0.14.1 #189

Uh oh!

Conversation

dependabot bot commented on behalf of github Oct 24, 2025

v0.14.1

Highlights

(Prototype) MoE training on Blackwell GPUs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant