uses an absorbing-state diffusion llm in conjunction with a tiny autoregressive lm for left-to-right generation, generating multiple easy tokens in parallel and slowing down at the hard parts. frequently achieves 5x speedups over pure autoregressive text generation.
can load both absorbing-state (mlm) diffusion models and an autoregressive (clm) models with a corresponding diffusion LoRA. make sure the diffusion model and the small autoregressive model are both initialized from the same model family, and generally have the same tokenizer (except of course mask tokens and such). a good example of this is the default Dream 7B alongside Qwen2.5-0.5B.
to run, just install the dependencies and run apd with whatever parameters you desire (the list of which can be found at the top of main):
uv sync
uv run apd.py --prompt="Please explain the Riemann hypothesis"
based on: https://arxiv.org/pdf/2506.00413