adaptive parallel decoding

uses an absorbing-state diffusion llm in conjunction with a tiny autoregressive lm for left-to-right generation, generating multiple easy tokens in parallel and slowing down at the hard parts. frequently achieves 5x speedups over pure autoregressive text generation.

can load both absorbing-state (mlm) diffusion models and an autoregressive (clm) models with a corresponding diffusion LoRA. make sure the diffusion model and the small autoregressive model are both initialized from the same model family, and generally have the same tokenizer (except of course mask tokens and such). a good example of this is the default Dream 7B alongside Qwen2.5-0.5B.

to run, just install the dependencies and run apd with whatever parameters you desire (the list of which can be found at the top of main):

uv sync
uv run apd.py --prompt="Please explain the Riemann hypothesis"

based on: https://arxiv.org/pdf/2506.00413

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
apd.py		apd.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

adaptive parallel decoding

About

Uh oh!

Releases

Packages

Languages

License

csutora/adaptive-parallel-decoding

Folders and files

Latest commit

History

Repository files navigation

adaptive parallel decoding

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages